Re: [IronPython] differences in IronPython/CPython regular expressions?

2011-06-02 Thread Bill Janssen
George Silva georger.si...@gmail.com wrote:

 If youre on Windows, you can test the native c# behvaior with a software
 called Rad Software regular expression designer. Its very helpful.

Thanks, George.  That looks like a useful piece of software.

Bill

___
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


Re: [IronPython] differences in IronPython/CPython regular expressions?

2011-06-02 Thread Bill Janssen
Jeff Hardy jdha...@gmail.com wrote:

 The fact that it works on CPython fairly fast indicates a bug
 somewhere, I'm just not sure if it's IronPython or Mono.

I just tried it with real MS .NET, on a 64-bit Windows 7 machine with a
new download of IronPython 2.7.  On that platform, it core-dumps (well,
ipy exits with a StackOverflowException).

Bill
___
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


Re: [IronPython] differences in IronPython/CPython regular expressions?

2011-06-02 Thread Jeff Hardy
On Thu, Jun 2, 2011 at 9:41 AM, Bill Janssen jans...@parc.com wrote:
 Jeff Hardy jdha...@gmail.com wrote:

 The fact that it works on CPython fairly fast indicates a bug
 somewhere, I'm just not sure if it's IronPython or Mono.

 I just tried it with real MS .NET, on a 64-bit Windows 7 machine with a
 new download of IronPython 2.7.  On that platform, it core-dumps (well,
 ipy exits with a StackOverflowException).

Any chance you could get a debugger on there and figure out where the
SOE is (IronPython or .NET)? If not, I can try to take a look if you
send the complete regex, but probably not until the weekend.

- Jeff
___
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


Re: [IronPython] differences in IronPython/CPython regular expressions?

2011-06-02 Thread Bill Janssen
Jeff Hardy jdha...@gmail.com wrote:

 On Thu, Jun 2, 2011 at 9:41 AM, Bill Janssen jans...@parc.com wrote:
  Jeff Hardy jdha...@gmail.com wrote:
 
  The fact that it works on CPython fairly fast indicates a bug
  somewhere, I'm just not sure if it's IronPython or Mono.
 
  I just tried it with real MS .NET, on a 64-bit Windows 7 machine with a
  new download of IronPython 2.7.  On that platform, it core-dumps (well,
  ipy exits with a StackOverflowException).
 
 Any chance you could get a debugger on there and figure out where the
 SOE is (IronPython or .NET)? If not, I can try to take a look if you
 send the complete regex, but probably not until the weekend.

Would gdb work?  I'll try.

The fact that it's different between .NET and Mono makes me guess it's
in the System::Text::RegularExpressions package.

Bill
___
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


Re: [IronPython] differences in IronPython/CPython regular expressions?

2011-06-02 Thread Jeff Hardy
On Thu, Jun 2, 2011 at 11:58 AM, Bill Janssen jans...@parc.com wrote:
 Would gdb work?  I'll try.

Mono's debugger might be better, if their regex engine is managed. It
looks like there's some Mono support in gdb but I've never used it.

On Windows, windbg is your friend. It's about as user-friendly as gdb,
though (take that how you want).

- Jeff
___
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


Re: [IronPython] differences in IronPython/CPython regular expressions?

2011-06-02 Thread L. Lee Saunders
 The fact that it's different between .NET and Mono makes me guess it's
 in the System::Text::RegularExpressions package.

If that is the case, it should be easy to test by using C#.  Just write a 
little console app to test you RegEx on both Mono and MS.NET.  If that fails 
then it is not a problem with IronPython but in the .NET core.

Of course I just may be smoking something.

Lee

___
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


Re: [IronPython] differences in IronPython/CPython regular expressions?

2011-06-01 Thread Jeff Hardy
On Wed, Jun 1, 2011 at 4:03 PM, Bill Janssen jans...@parc.com wrote:
 I have a large RE (223613 chars) that works fine in CPython 2.6, but

That's truly horrible, but I assume you have a good reason for it.

 seems to produce an endless loop in IronPython (see below).  I'm using
 Mono 2.10 (.NET 4.0.x) on Ubuntu, with IronPython 2.7.  Anyone have
 pointers to the differences between them?  Is
 System::Text::RegularExpressions in .NET configurable in some fashion
 that might help?

First off, is there a reason you don't use re.IGNORECASE? That would
cut the regex in half, at least.

For the most part, CPython and IronPython regexes should be fairly
compatible - IronPython takes the regex and massages it to work with
System.Text.RE, but the changes are pretty straightforward and small,
and I don't think the re you provided hits any of them. It's quite
possible that the Mono version of System.Text.RE can't handle the
expression; you could test this saving the full regex and building a
small C# program that runs it. The regex template has a lot of
potential backtracking in it; are you sure it's not caught in a
pathological (exponential) case?

Finally, is one ginormous really the best way to do this? Have you
tried other approaches?

- Jeff


 I'm a .NET newbie.

 TIA,

 Bill

 --
 import sys, os, re

 try:
    # we use the name lists in nltk to create person-name matching patterns
    import nltk.data
 except ImportError:
    sys.stderr.write(Can't import nltk; can't do name lists.\nSee 
 http://www.nltk.org/.\n;)
    sys.exit(1)
 else:
    __MALE_NAME_EXCLUDES = (Hill,
                          Ave,
                          )
    __FEMALE_NAME_EXCLUDES = ()
    __FEMALE_NAMES = [x for x in
                      nltk.data.load(corpora/names/female.txt, 
 format=raw).split(\n)
                      if (x and (x not in __FEMALE_NAME_EXCLUDES))]
    __FEMALE_NAMES += [x.upper() for x in __FEMALE_NAMES]
    __MALE_NAMES = [x for x in
                    nltk.data.load(corpora/names/male.txt, 
 format=raw).split(\n)
                    if (x and (x not in __MALE_NAME_EXCLUDES))]
    __MALE_NAMES += [x.upper() for x in __MALE_NAMES]
    __INITS = [chr(x) for x in range(ord('A'), ord('Z'))]

 PERSON_PATTERN = re.compile(
    ^((?PhonorificMr|Ms|Mrs|Dr|MR|MS|MRS|DR)\.? )?         # honorific
    (?Pfirstname +
    |.join(__FEMALE_NAMES + __MALE_NAMES + __INITS) + # first name
    )
    ( (?Pmiddlename([A-Z]\.)|( +
    |.join(__FEMALE_NAMES + __MALE_NAMES) +         # middle initial or name
    )))?
     +(?Plastname[A-Z][A-Za-z]+),             # space then last name
    re.MULTILINE)

 print PERSON_PATTERN.match(Mr. John Smith)
 ___
 Users mailing list
 Users@lists.ironpython.com
 http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

___
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


Re: [IronPython] differences in IronPython/CPython regular expressions?

2011-06-01 Thread Bill Janssen
Jeff Hardy jdha...@gmail.com wrote:

 On Wed, Jun 1, 2011 at 4:03 PM, Bill Janssen jans...@parc.com wrote:
  I have a large RE (223613 chars) that works fine in CPython 2.6, but
 
 That's truly horrible, but I assume you have a good reason for it.

Hi, Jeff.  Yes, I think so.

  seems to produce an endless loop in IronPython (see below).  I'm using
  Mono 2.10 (.NET 4.0.x) on Ubuntu, with IronPython 2.7.  Anyone have
  pointers to the differences between them?  Is
  System::Text::RegularExpressions in .NET configurable in some fashion
  that might help?
 
 First off, is there a reason you don't use re.IGNORECASE? That would
 cut the regex in half, at least.

Sure.  Names sensitive to capitalization; the rule I'm implementing says
names are either capitalized or upper-case.

 For the most part, CPython and IronPython regexes should be fairly
 compatible - IronPython takes the regex and massages it to work with
 System.Text.RE, but the changes are pretty straightforward and small,

Are those changes documented anywhere?

 and I don't think the re you provided hits any of them. It's quite
 possible that the Mono version of System.Text.RE can't handle the
 expression; you could test this saving the full regex and building a
 small C# program that runs it. The regex template has a lot of
 potential backtracking in it; are you sure it's not caught in a
 pathological (exponential) case?

No; all I'm sure of is that this runs in 1.2 seconds in CPython, and
takes up a core for 15 minutes (till I kill it) with IronPython/Mono.
Something is clearly hitting a bug somewhere...  I suppose I should
try it on Windows.

 Finally, is one ginormous really the best way to do this? Have you
 tried other approaches?

No need, until I hit .NET.  I'm used to working with a full-featured
finite-state machine (PARC's xfst; see
http://www.cis.upenn.edu/~cis639/docs/xfst.html), and was wondering if
we could do similar things with Python's RE machinery.  Long lists like
these names are often used for lists of companies or cities or such.
People's names are actually a fairly simple and short example of this :-).

Bill
___
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


Re: [IronPython] differences in IronPython/CPython regular expressions?

2011-06-01 Thread George Silva
If youre on Windows, you can test the native c# behvaior with a software
called Rad Software regular expression designer. Its very helpful.

On Wed, Jun 1, 2011 at 8:44 PM, Bill Janssen jans...@parc.com wrote:

 Jeff Hardy jdha...@gmail.com wrote:

  On Wed, Jun 1, 2011 at 4:03 PM, Bill Janssen jans...@parc.com wrote:
   I have a large RE (223613 chars) that works fine in CPython 2.6, but
 
  That's truly horrible, but I assume you have a good reason for it.

 Hi, Jeff.  Yes, I think so.

   seems to produce an endless loop in IronPython (see below).  I'm using
   Mono 2.10 (.NET 4.0.x) on Ubuntu, with IronPython 2.7.  Anyone have
   pointers to the differences between them?  Is
   System::Text::RegularExpressions in .NET configurable in some fashion
   that might help?
 
  First off, is there a reason you don't use re.IGNORECASE? That would
  cut the regex in half, at least.

 Sure.  Names sensitive to capitalization; the rule I'm implementing says
 names are either capitalized or upper-case.

  For the most part, CPython and IronPython regexes should be fairly
  compatible - IronPython takes the regex and massages it to work with
  System.Text.RE, but the changes are pretty straightforward and small,

 Are those changes documented anywhere?

  and I don't think the re you provided hits any of them. It's quite
  possible that the Mono version of System.Text.RE can't handle the
  expression; you could test this saving the full regex and building a
  small C# program that runs it. The regex template has a lot of
  potential backtracking in it; are you sure it's not caught in a
  pathological (exponential) case?

 No; all I'm sure of is that this runs in 1.2 seconds in CPython, and
 takes up a core for 15 minutes (till I kill it) with IronPython/Mono.
 Something is clearly hitting a bug somewhere...  I suppose I should
 try it on Windows.

  Finally, is one ginormous really the best way to do this? Have you
  tried other approaches?

 No need, until I hit .NET.  I'm used to working with a full-featured
 finite-state machine (PARC's xfst; see
 http://www.cis.upenn.edu/~cis639/docs/xfst.html), and was wondering if
 we could do similar things with Python's RE machinery.  Long lists like
 these names are often used for lists of companies or cities or such.
 People's names are actually a fairly simple and short example of this :-).

 Bill
 ___
 Users mailing list
 Users@lists.ironpython.com
 http://lists.ironpython.com/listinfo.cgi/users-ironpython.com




-- 
George R. C. Silva

Desenvolvimento em GIS
http://geoprocessamento.net
http://blog.geoprocessamento.net
___
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


Re: [IronPython] differences in IronPython/CPython regular expressions?

2011-06-01 Thread Jeff Hardy
 Sure.  Names sensitive to capitalization; the rule I'm implementing says
 names are either capitalized or upper-case.

Ah, I see that now. I assumed the name lists were in lower case.


 For the most part, CPython and IronPython regexes should be fairly
 compatible - IronPython takes the regex and massages it to work with
 System.Text.RE, but the changes are pretty straightforward and small,

 Are those changes documented anywhere?

The code is in Languages\IronPython\IronPython.Modules\re.cs in the
PreParseRegex function; it's pretty straightforward, if a little long.
Looking at it again, it's quite possible there's a bug in there, but
we'd need a minimal repro to have any hope of finding it.

 No need, until I hit .NET.  I'm used to working with a full-featured
 finite-state machine (PARC's xfst; see
 http://www.cis.upenn.edu/~cis639/docs/xfst.html), and was wondering if
 we could do similar things with Python's RE machinery.  Long lists like
 these names are often used for lists of companies or cities or such.
 People's names are actually a fairly simple and short example of this :-).

The fact that it works on CPython fairly fast indicates a bug
somewhere, I'm just not sure if it's IronPython or Mono.

- Jeff
___
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com