[Issue 7551] Regex parsing bug for right bracket in character class

2016-04-06 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=7551

--- Comment #7 from github-bugzi...@puremagic.com ---
Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/afd16eac09d8178d47d2e39ba8fe87c765369fc7
Fix issue 7551 - Regex parsing bug for right bracket in character class

https://github.com/D-Programming-Language/phobos/commit/0ce66bc7aa9aee44757bda5b6102b8180dda3b19
Merge pull request #4161 from DmitryOlshansky/issue-11765

Fix issue 7551 - Regex parsing bug for right bracket in character class

--


[Issue 7551] Regex parsing bug for right bracket in character class

2012-02-27 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=7551



--- Comment #2 from Magnus Lie Hetland mag...@hetland.org 2012-02-27 00:44:59 
PST ---
It did exist in the previous version -- my code broke with the new regexp
engine, but worked before :-)

If this is a conscious choice, then that's totally fine by me. Special cases
aren't the right way to go when the general mechanism works. I had some trouble
getting this to work (did something like what you wrote here, which won't work
-- but double-escaping does, of course), so I ended up with using the
or-operator, which was kind of hackish ;-)

So, yeah, I guess I retract my bug report :-

As for other languages: Yeah, I think this is pretty common. E.g., Python
(http://docs.python.org/library/re.html) and in Perl and Perl-compatible
regexps, as used in all kinds of places, such as PHP, Apache, Safari, …
(http://www.php.net/manual/en/regexp.reference.character-classes.php).

So I think the place member end brackets as first character is the industry
standard behavior.

But as a compromise: Perhaps a useful error message pointing out the escape
thing could be added? Or it could be explicitly pointed out in a note in the
documentation (to avoid special-casing the error code)?

I think some kind of least surprise handling for people coming from basically
anywhere else might be useful ;-)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 7551] Regex parsing bug for right bracket in character class

2012-02-27 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=7551



--- Comment #3 from Magnus Lie Hetland mag...@hetland.org 2012-02-27 00:51:18 
PST ---
This whole thing goes for start brackets, too, I guess. As far as I can see,
they, too, must be escaped when used inside character classes, now. This
follows from the definition in the docs, for sure, but wasn't entirely obvious
to me -- especially given that it worked before. (I.e., that was another thing
that broke in my code recently, when upgrading.)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 7551] Regex parsing bug for right bracket in character class

2012-02-27 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=7551


Dmitry Olshansky dmitry.o...@gmail.com changed:

   What|Removed |Added

   Severity|normal  |enhancement


--- Comment #4 from Dmitry Olshansky dmitry.o...@gmail.com 2012-02-27 
02:36:06 PST ---
Full backwards compatibility looked like a nice idea at start. 
I'm increasingly regret that decision, as things still got broken as I had to
add new features that block some undocumented behavior.

Ehm escape sequences were partly broken in 2.057 ... sorry about that.

BTW this page shows that [ and ] should be escaped, and not a single word on it
used as first character (unlike '-' that is supported).
http://www.php.net/manual/en/regexp.reference.character-classes.php

About Python, heh, I'm eager to see how would they go about adding set
operations without breaking compatibility (they count [ as plain '[' in the
middle of charset). I guess a brand new module if it they ever will.

 
 But as a compromise: Perhaps a useful error message pointing out the escape
 thing could be added? Or it could be explicitly pointed out in a note in the
 documentation (to avoid special-casing the error code)?
 
 I think some kind of least surprise handling for people coming from 
 basically
 anywhere else might be useful ;-)

Hm.. that's a good idea. Hereby it's an enhacement request ;)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 7551] Regex parsing bug for right bracket in character class

2012-02-27 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=7551



--- Comment #6 from Dmitry Olshansky dmitry.o...@gmail.com 2012-02-27 
05:18:12 PST ---
(In reply to comment #5)
 Quoting Dmitry:
  BTW this page shows that [ and ] should be escaped, and not a single word 
  on it
  used as first character (unlike '-' that is supported).
  http://www.php.net/manual/en/regexp.reference.character-classes.php
 
 Huh? Did you read the first paragraph…?-)

Searching gets the better of me :( I 'greped' for [ 

 
 Quoted, for your convenience (my highlight):
  An opening square bracket introduces a character class, terminated by a 
  closing square
  bracket. A closing square bracket on its own is not special. **If a closing 
  square bracket is
  required as a member of the class, it should be the first data character in 
  the class** (after
  an initial circumflex, if present) or escaped with a backslash.
 
 It says so right there, no? This is the way it's been in several languages 
 I've
 used throughout the years. I guess they just didn't have escaping inside
 character classes in the olden days ;-)

Apparently it's one of these historical kind of things.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 7551] Regex parsing bug for right bracket in character class

2012-02-24 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=7551


Dmitry Olshansky dmitry.o...@gmail.com changed:

   What|Removed |Added

 CC||dmitry.o...@gmail.com


--- Comment #1 from Dmitry Olshansky dmitry.o...@gmail.com 2012-02-24 
11:28:25 PST ---
It perfectly fine to use escapes for special characters:

import std.regex;
void main() {
auto r = regex([\]]);
}

The reason for killing first bracket doesn't count rule (if ever knew it
existed)
is that new regex allows doing things like 
[[abc0-9]--[bcd||1-9]] 
i.e. set operations 
the above should get you [bc0], it's more useful with \p{xxx} things.
Basically braces do matter more now. 
But this many other languages... (or better libraries) - which ones? Unless
there is strong precident I'm not doing another special case.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---