Re: [racket-dev] no capturing groups in regexp-split? [was Re: [PATCH] add regexp-split]
7 hours ago, Marijn wrote: > > On 30-12-11 09:32, Eli Barzilay wrote: > > >> Without the capturing group the results are identical: [...] > > > > Which is expected. > > Good, just establishing a baseline here, but it is good that some > compatibility is *expected*. I meant that getting the same results is the thing that is expected. > > There's probably uses for that -- at least for the simple version > > with a single group around the whole regexp, but that's some > > hybrid of `regexp-split' and `regexp-match*': it returns something > > that interlevase them, which can be useful, but I'd rather see it > > with a different name. > > Yes, I agree that I find it a bit weird as well. > > You don't lose anything by supporting this though, since you can > always use a non-capturing group, but I do agree that it can be > considered an inappropriate extension of the meaning of > regexp-split. I'll be sure to raise these issues on the guile list. We do have an important loss -- all current code that will break because it assumes the current behavior. This is why I suggested a different name -- the new function could be used to return the gaps (as split does), the matches (as match*), the matches including groups, or all of these. > > We've talked semi-recently about adding an option to > > `regexp-match*' so it can return the lists of matches for each > > pattern, perhaps add another option for returning the unmatched > > sequences between them, and give the whole thing a new name? > > (Something that indicates it being the multitool version of all of > > these.) > > Interesting. BTW, one thing that I think it should do is avoid the splicing of the group matches that python is doing. In any case, any suggestions for a name? -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] no capturing groups in regexp-split? [was Re: [PATCH] add regexp-split]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 30-12-11 09:32, Eli Barzilay wrote: > This doesn't look like an issue that is related to guile, just that > he chose python as the goal... The first other random example I > tried was `split-string' in Emacs, which did the same thing as > Racket. They may choose python's version as the goal. It doesn't look like they have looked very hard as of yet at what else is out there. Probably because they are expecting compatibility between most implementations. > >> Welcome to Racket v5.2.0.7. >>> (regexp-split "([^0-9])" "123+456*/") >> '("123" "456" "" "") >> >> should it be considered a bug in racket that it doesn't support >> capturing groups in regexp-split? > > No. > > >> Without the capturing group the results are identical: [...] > > Which is expected. Good, just establishing a baseline here, but it is good that some compatibility is *expected*. How nice is that? Since we're expecting compatibility between python and racket, I guess it goes without saying that racket's and guile's regexp-split should be compatible as well. R7RS Large may standardize a regular expression library, and we can make that easier by reducing incompatibilities between schemes. We can all grow from examining our incompatibilities, discussing them and sometimes resolving them. > Python does something which is IMO very weird: > re.split("([^0-9])", "123+456*/") > ['123', '+', '456', '*', '', '/', ''] > > It's even more confusing with multiple patterns: > re.split("([^0-9]([0-9]))", "123+456*/") > ['123', '+4', '4', '56*/'] > > There's probably uses for that -- at least for the simple version > with a single group around the whole regexp, but that's some hybrid > of `regexp-split' and `regexp-match*': it returns something that > interlevase them, which can be useful, but I'd rather see it with > a different name. Yes, I agree that I find it a bit weird as well. You don't lose anything by supporting this though, since you can always use a non-capturing group, but I do agree that it can be considered an inappropriate extension of the meaning of regexp-split. I'll be sure to raise these issues on the guile list. > We've talked semi-recently about adding an option to > `regexp-match*' so it can return the lists of matches for each > pattern, perhaps add another option for returning the unmatched > sequences between them, and give the whole thing a new name? > (Something that indicates it being the multitool version of all of > these.) Interesting. Marijn -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk79i1QACgkQp/VmCx0OL2zI4gCgtLLd3b6vgzaksYSA7wsZksHA yeIAoJJ6G7AcimN3OhtxFMvN8Xf7TdrH =1+Ax -END PGP SIGNATURE- _ Racket Developers list: http://lists.racket-lang.org/dev
[racket-dev] no capturing groups in regexp-split? [was Re: [PATCH] add regexp-split]
Yesterday, Marijn wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi, > > this just appeared on guile-devel, but it seems to have exposed a bug > in racket. > > On 29-12-11 10:32, Nala Ginrut wrote: > > [...] This doesn't look like an issue that is related to guile, just that he chose python as the goal... The first other random example I tried was `split-string' in Emacs, which did the same thing as Racket. > Welcome to Racket v5.2.0.7. > > (regexp-split "([^0-9])" "123+456*/") > '("123" "456" "" "") > > should it be considered a bug in racket that it doesn't support > capturing groups in regexp-split? No. > Without the capturing group the results are identical: [...] Which is expected. > >>> import re re.split("[^0-9]", "123+456*/") > ['123', '456', '', ''] > > > (regexp-split "[^0-9]" "123+456*/") > '("123" "456" "" "") It was tricky to dig out what you wanted here... Python does something which is IMO very weird: >>> re.split("([^0-9])", "123+456*/") ['123', '+', '456', '*', '', '/', ''] It's even more confusing with multiple patterns: >>> re.split("([^0-9]([0-9]))", "123+456*/") ['123', '+4', '4', '56*/'] There's probably uses for that -- at least for the simple version with a single group around the whole regexp, but that's some hybrid of `regexp-split' and `regexp-match*': it returns something that interlevase them, which can be useful, but I'd rather see it with a different name. We've talked semi-recently about adding an option to `regexp-match*' so it can return the lists of matches for each pattern, perhaps add another option for returning the unmatched sequences between them, and give the whole thing a new name? (Something that indicates it being the multitool version of all of these.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
[racket-dev] no capturing groups in regexp-split? [was Re: [PATCH] add regexp-split]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, this just appeared on guile-devel, but it seems to have exposed a bug in racket. On 29-12-11 10:32, Nala Ginrut wrote: > hi guilers! It seems like there's no "regexp-split" procedure in > Guile. What we have is "string-split" which accepted Char only. So > I wrote one for myself. > > --python code- import re re.split("([^0-9])", "123+456*/") > [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’] code end--- > > The Guile version: > > --guile code--- (regexp-split "([^0-9])" "123+456*/") > ==>("123" "+" "456" "*" "" "/" "") --code end > > Anyone interested in it? Welcome to Racket v5.2.0.7. > (regexp-split "([^0-9])" "123+456*/") '("123" "456" "" "") should it be considered a bug in racket that it doesn't support capturing groups in regexp-split? Without the capturing group the results are identical: >>> import re re.split("[^0-9]", "123+456*/") ['123', '456', '', ''] > (regexp-split "[^0-9]" "123+456*/") '("123" "456" "" "") Marijn -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk78QJUACgkQp/VmCx0OL2xi1ACgu2CbR7PPti7KZVkqAHvW9Cep VO0AnAm0fWP+q6BH/zcqOd3TuaSEiarN =xuUl -END PGP SIGNATURE- _ Racket Developers list: http://lists.racket-lang.org/dev