Re: [racket-dev] no capturing groups in regexp-split? [was Re: [PATCH] add regexp-split]

2011-12-30 Thread Eli Barzilay
7 hours ago, Marijn wrote:
> 
> On 30-12-11 09:32, Eli Barzilay wrote:
> 
> >> Without the capturing group the results are identical: [...]
> > 
> > Which is expected.
> 
> Good, just establishing a baseline here, but it is good that some
> compatibility is *expected*.

I meant that getting the same results is the thing that is expected.


> > There's probably uses for that -- at least for the simple version
> > with a single group around the whole regexp, but that's some
> > hybrid of `regexp-split' and `regexp-match*': it returns something
> > that interlevase them, which can be useful, but I'd rather see it
> > with a different name.
> 
> Yes, I agree that I find it a bit weird as well.
> 
> You don't lose anything by supporting this though, since you can
> always use a non-capturing group, but I do agree that it can be
> considered an inappropriate extension of the meaning of
> regexp-split.  I'll be sure to raise these issues on the guile list.

We do have an important loss -- all current code that will break
because it assumes the current behavior.  This is why I suggested a
different name -- the new function could be used to return the gaps
(as split does), the matches (as match*), the matches including
groups, or all of these.


> > We've talked semi-recently about adding an option to
> > `regexp-match*' so it can return the lists of matches for each
> > pattern, perhaps add another option for returning the unmatched
> > sequences between them, and give the whole thing a new name?
> > (Something that indicates it being the multitool version of all of
> > these.)
> 
> Interesting.

BTW, one thing that I think it should do is avoid the splicing of the
group matches that python is doing.

In any case, any suggestions for a name?

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] no capturing groups in regexp-split? [was Re: [PATCH] add regexp-split]

2011-12-30 Thread Marijn
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30-12-11 09:32, Eli Barzilay wrote:
> This doesn't look like an issue that is related to guile, just that
> he chose python as the goal...  The first other random example I
> tried was `split-string' in Emacs, which did the same thing as
> Racket.

They may choose python's version as the goal. It doesn't look like
they have looked very hard as of yet at what else is out there.
Probably because they are expecting compatibility between most
implementations.

> 
>> Welcome to Racket v5.2.0.7.
>>> (regexp-split "([^0-9])"  "123+456*/")
>> '("123" "456" "" "")
>> 
>> should it be considered a bug in racket that it doesn't support 
>> capturing groups in regexp-split?
> 
> No.
> 
> 
>> Without the capturing group the results are identical: [...]
> 
> Which is expected.

Good, just establishing a baseline here, but it is good that some
compatibility is *expected*. How nice is that? Since we're expecting
compatibility between python and racket, I guess it goes without
saying that racket's and guile's regexp-split should be compatible as
well. R7RS Large may standardize a regular expression library, and we
can make that easier by reducing incompatibilities between schemes. We
can all grow from examining our incompatibilities, discussing them and
sometimes resolving them.

> Python does something which is IMO very weird:
> 
 re.split("([^0-9])", "123+456*/")
> ['123', '+', '456', '*', '', '/', '']
> 
> It's even more confusing with multiple patterns:
> 
 re.split("([^0-9]([0-9]))", "123+456*/")
> ['123', '+4', '4', '56*/']
> 
> There's probably uses for that -- at least for the simple version
> with a single group around the whole regexp, but that's some hybrid
> of `regexp-split' and `regexp-match*': it returns something that 
> interlevase them, which can be useful, but I'd rather see it with
> a different name.

Yes, I agree that I find it a bit weird as well.

You don't lose anything by supporting this though, since you can
always use a non-capturing group, but I do agree that it can be
considered an inappropriate extension of the meaning of regexp-split.
I'll be sure to raise these issues on the guile list.

> We've talked semi-recently about adding an option to
> `regexp-match*' so it can return the lists of matches for each
> pattern, perhaps add another option for returning the unmatched
> sequences between them, and give the whole thing a new name?
> (Something that indicates it being the multitool version of all of
> these.)

Interesting.

Marijn
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk79i1QACgkQp/VmCx0OL2zI4gCgtLLd3b6vgzaksYSA7wsZksHA
yeIAoJJ6G7AcimN3OhtxFMvN8Xf7TdrH
=1+Ax
-END PGP SIGNATURE-
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


[racket-dev] no capturing groups in regexp-split? [was Re: [PATCH] add regexp-split]

2011-12-30 Thread Eli Barzilay
Yesterday, Marijn wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Hi,
> 
> this just appeared on guile-devel, but it seems to have exposed a bug
> in racket.
> 
> On 29-12-11 10:32, Nala Ginrut wrote:
> > [...]

This doesn't look like an issue that is related to guile, just that he
chose python as the goal...  The first other random example I tried
was `split-string' in Emacs, which did the same thing as Racket.


> Welcome to Racket v5.2.0.7.
> > (regexp-split "([^0-9])"  "123+456*/")
> '("123" "456" "" "")
> 
> should it be considered a bug in racket that it doesn't support
> capturing groups in regexp-split?

No.


> Without the capturing group the results are identical: [...]

Which is expected.


> >>> import re re.split("[^0-9]", "123+456*/")
> ['123', '456', '', '']
> 
> > (regexp-split "[^0-9]"  "123+456*/")
> '("123" "456" "" "")

It was tricky to dig out what you wanted here...  Python does
something which is IMO very weird:

  >>> re.split("([^0-9])", "123+456*/")
  ['123', '+', '456', '*', '', '/', '']

It's even more confusing with multiple patterns:

  >>> re.split("([^0-9]([0-9]))", "123+456*/")
  ['123', '+4', '4', '56*/']

There's probably uses for that -- at least for the simple version with
a single group around the whole regexp, but that's some hybrid of
`regexp-split' and `regexp-match*': it returns something that
interlevase them, which can be useful, but I'd rather see it with a
different name.

We've talked semi-recently about adding an option to `regexp-match*'
so it can return the lists of matches for each pattern, perhaps add
another option for returning the unmatched sequences between them, and
give the whole thing a new name?  (Something that indicates it being
the multitool version of all of these.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


[racket-dev] no capturing groups in regexp-split? [was Re: [PATCH] add regexp-split]

2011-12-29 Thread Marijn
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

this just appeared on guile-devel, but it seems to have exposed a bug
in racket.

On 29-12-11 10:32, Nala Ginrut wrote:
> hi guilers! It seems like there's no "regexp-split" procedure in
> Guile. What we have is "string-split" which accepted Char only. So
> I wrote one for myself.
> 
> --python code-
 import re re.split("([^0-9])", "123+456*/")
> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’] code end---
> 
> The Guile version:
> 
> --guile code--- (regexp-split "([^0-9])"  "123+456*/") 
> ==>("123" "+" "456" "*" "" "/" "") --code end
> 
> Anyone interested in it?

Welcome to Racket v5.2.0.7.
> (regexp-split "([^0-9])"  "123+456*/")
'("123" "456" "" "")

should it be considered a bug in racket that it doesn't support
capturing groups in regexp-split? Without the capturing group the
results are identical:

>>> import re re.split("[^0-9]", "123+456*/")
['123', '456', '', '']

> (regexp-split "[^0-9]"  "123+456*/")
'("123" "456" "" "")

Marijn
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk78QJUACgkQp/VmCx0OL2xi1ACgu2CbR7PPti7KZVkqAHvW9Cep
VO0AnAm0fWP+q6BH/zcqOd3TuaSEiarN
=xuUl
-END PGP SIGNATURE-
_
  Racket Developers list:
  http://lists.racket-lang.org/dev