[issue1160] Medium size regexp crashes python

2012-11-20 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 10314c9b7c5a by Antoine Pitrou in branch '2.7':
Issue #1160: Fix compiling large regular expressions on UCS2 builds.
http://hg.python.org/cpython/rev/10314c9b7c5a

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2012-11-20 Thread Roundup Robot

Roundup Robot added the comment:

New changeset a3579d766fb6 by Antoine Pitrou in branch '3.2':
Issue #1160: Fix compiling large regular expressions on UCS2 builds.
http://hg.python.org/cpython/rev/a3579d766fb6

New changeset 8b73a069ae4f by Antoine Pitrou in branch '3.3':
Merge test from issue #1160.
http://hg.python.org/cpython/rev/8b73a069ae4f

New changeset 10c1f9e05f4f by Antoine Pitrou in branch 'default':
Merge test from issue #1160.
http://hg.python.org/cpython/rev/10c1f9e05f4f

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2012-11-20 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I've committed the patch to 3.2 and 2.7, and added the test to 3.3 and default. 
Thank you!

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2012-11-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

If we will considered it's important enough to fix, here are patches.

--
keywords: +patch
Added file: http://bugs.python.org/file27957/sre_code_ucs4-2.7.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2012-11-11 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file27958/sre_code_ucs4-3.2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2012-11-11 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
stage: test needed - patch review
Added file: http://bugs.python.org/file27959/sre_code_ucs4_test.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2012-06-30 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 This has probably been fixed in 3.3 in c67b7e0c818a.

Then the issue may be closed?

--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2012-06-30 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Then the issue may be closed?

Well, it's still present in 2.7 and 3.2 (assuming we consider it's important 
enough to fix).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2012-06-24 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

This has probably been fixed in 3.3 in c67b7e0c818a.

--
assignee: effbot - 
versions: +Python 3.2 -Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2010-02-09 Thread Matthew Barnett

Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

As stated in msg73781, this is being addressed in issue #2636.

My regex module handles the test case without complaint:

 import regex
 r = regex.compile('|'.join('%d'%x for x in range(7000)))
 r.match(1000)
_regex.REGEX_Match object at 0x015D2920
 r.match(6999)
_regex.REGEX_Match object at 0x016DDC20

--
nosy: +mrabarnett

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2009-11-15 Thread Michael K Johnson

Michael K Johnson a1237+pyb...@danlj.org added the comment:

The test case at the top of this issue reproduces just fine; if you are
looking for a different test case you'll have to specify what you don't
like about it so that it's clear what you are looking for.

I don't think there's any mystery about this issue; it seems perfectly
well understood.  I commented merely to encourage others who run into
this issue to consider one way of using sets if they are running into
the same case I was, in which I was trying to use a regular expression
to match a candidate string against a large set of exact matches.

I was doing this because the initial purpose of the interface I was
working with was to allow small, hand-specified regular expressions;
this interface was later additionally wrapped in code that automatically
created regular expressions for this interface originally (and still
also) intended for use with hand-crafted regular expressions.  That's
why the interface was not originally crafted to use sets, and why it was
not appropriate to simply change the interface to use sets.  However, my
interface also allows passing a callable which resolves the object at
the time of use, and so I merely passed a reference to a method which
returned an object derived from set but which implemented the match and
search methods.

If you REALLY want a simpler reproducer, this does it for me in the
restricted case (i.e., not using UCS4 encoding):
 import re
 r = re.compile('|'.join(('%d'%x for x in range(7000

But I really don't think that additional test cases are a barrier here.

Again, my goal was merely to suggest an easy way to use sets as a
replacement for regexps, for machine-generated regexps intended to match
against exact strings; subclass set and add necessary methods such as
search and/or match.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2009-11-13 Thread Michael K Johnson

Michael K Johnson a1237+pyb...@danlj.org added the comment:

I also ran into this issue, and dealt with it as suggested here by
changing to sets.  Because I have underlying code that has to deal both
with small hand-crafted regular expressions and arbitrarily-large
machine-generated sets of paths, I subclassed set and implemented the
match and search methods in my subclass so that the underlying code
would work both against the hand-generated regular expressions and the
machine-generated sets of paths.  Hope this helps someone else who runs
into this restriction.

--
nosy: +johnsonm

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2009-11-13 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

Michael, can you provide the regex or even better a testcase that shows
the problem?

--
stage:  - test needed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2009-10-19 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Kristall, can you post the troublesome regular expression?

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2009-10-18 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti
type: crash - behavior
versions: +Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2009-10-17 Thread kristall

kristall krist...@c-base.org added the comment:

aloha,

I also use large RE's. re.compile() worked fine under python2.6 (OS
ubuntu-linux). After moving the code to python3.0 I get the same error
as ostkamp did. Under 3.1 also. Under 3.1 I tried to the fix that
ostkamp described (setting 'short' to 'long' in Modules/sre.h) and
rebuild python, but still the error occurs. I want to change to 3.x
since my variables contain german text with Umlauten (ä, ö, ü etc.) and
its pain to work with unicode under 2.x. Is there anything else I can
try, or is there a planned date when this bug will be fixed. I am
thankful in advance for any help.

kristall

--
nosy: +kristall

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2008-09-25 Thread Jeffrey C. Jacobs

Changes by Jeffrey C. Jacobs [EMAIL PROTECTED]:


--
nosy: +timehorse

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2008-09-25 Thread Jeffrey C. Jacobs

Jeffrey C. Jacobs [EMAIL PROTECTED] added the comment:

It seems that changing the size type of the Regular Expression Byte-code
is a nice quick-fix, even though it doubles the size of a pattern.  It
may have the added benefit that most machine architectures available
today are at least partially, if not fully, 32-bit oriented so that
retrieving op codes may in fact be faster if we make this change.  OTOH,
it implies something interesting IMHO with the repeat count limits we
currently have.  Repeat counts can be explicitly set up to 65534 times
because 65535, being the largest number you can express in a 16-bit
unsigned integer, is currently reserved to mean Infinite.  It seems to
me this is a great opportunity to set that limit to (unsigned long)-1,
since that repeat count is incredibly large.

OTOH, if size is an issue, we could change the way sizes are expressed
in the Regexp Op Codes (typically in skip counts) to be 15-bit, with the
Most Significant Bit being reserved for 'extended' expressions.  In this
way, a value of 0x could be expressed as:

0x 0x 0x0003

Of course, parsing number in this form is a pain, to say the least, and
unlike in Python, the C-library would not play nicely if someone tried
to express a number that could not fit into what the architecture
defined an int to be.  Plus, there is the problem of how you express
Infinite with this scheme.  The advantage though would be we don't have
to change the op-code size and these 'extended' counts would be very
rare indeed.

Over all, I'm more of an Occam's Razor fan in that the simplest solution
is probably the best: just change the op-code size to unsigned long
(which, on SOME architectures would actually make it 64-bits!) and
define the 'Infinite' constant as (unsigned long)-1.  Mind you, I prefer
defining the constant in Python, not C, and it would be hard for Python
to determine that particular value being that Python is meant to be 'the
same' regardless of the underlying architecture, but that's another issue.

Anyway, as 2.6 is in Beta, this will have to wait for Python 2.7 / 3.1,
and so I will add an item to Issue 2636 with respect to it.

--
versions: +Python 2.7 -Python 2.5

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2008-09-02 Thread Adi

Adi [EMAIL PROTECTED] added the comment:

Is there any progress on this bug? I am currently considering
maintaining a branch of 2.5.2 which includes the patch suggested by effbot.

--
nosy: +adi

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2008-04-24 Thread Russ Cox

Changes by Russ Cox [EMAIL PROTECTED]:


--
nosy: +rsc

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2008-04-20 Thread Greg Detre

Greg Detre [EMAIL PROTECTED] added the comment:

Dear all,

I've just switched from linux to a mac, and I'm suddenly starting to
experience this issue with a machine-generated regexp that I depend on.
Are there any plans to fix this in a future version of python?

Thank you,
   Greg

--
nosy: [EMAIL PROTECTED]

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2008-02-07 Thread Guilherme Polo

Guilherme Polo added the comment:

I tried Frederik's solution against trunk and it works. I compiled
python with ucs2 so it is surely setting SRE_CODE to unsigned long.
Before this change I got the same exception as pointed by Guido Ostkamp.

--
nosy: +gpolo

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2008-01-11 Thread A.M. Kuchling

A.M. Kuchling added the comment:

Trying effbot's suggested experiment is easy, at least, and would
provide useful info.  If it fails, then fixing this bug might be difficult.

--
keywords: +easy
nosy: +akuchling

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2007-11-28 Thread mathieu Clabaut

Changes by mathieu Clabaut:


--
nosy: +mathieu.clabaut

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2007-10-07 Thread Brett Cannon

Changes by Brett Cannon:


--
priority:  - normal

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2007-09-23 Thread Fredrik Lundh

Fredrik Lundh added the comment:

Well, I'm not sure 81k qualifies as medium sized, really.  If you look
at the size distribution for typical RE:s (which are usually
handwritten, not machine generated), that's one or two orders of
magnitude larger than medium.

(And even if this was guaranteed to work on all Python builds, my guess
is that performance would be pretty bad compared to a using a minimal RE
and checking potential matches against a set.  The | operator is
mostly O(N), not O(1).)

As for fixing this, the byte code used by the RE engine uses a word
size equal to the Unicode character size (sizeof(Py_UNICODE)) for the
given platform.  I don't think it would be that hard to set it to 32
bits also on platforms using 16-bit Unicode characters (if anyone would
like to experiment, just set SRE_CODE to unsigned long in sre.h and
see what happens when you run the test suite).

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2007-09-14 Thread Guido van Rossum

Guido van Rossum added the comment:

/F?

--
assignee:  - effbot
nosy: +effbot, gvanrossum

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1160] Medium size regexp crashes python

2007-09-13 Thread Guido Ostkamp

New submission from 
Guido Ostkamp
:

Hello,

a medium size regular expression crashes Python 2.5.1 as follows:

Traceback (most recent call last):
  File ./regtest.py, line 14, in module
m = rematch(pats)
  File ./regtest.py, line 12, in rematch
return re.compile(pat).match
  File /export/home/ostkamp/local/lib/python2.5/re.py, line 180, in
compile
return _compile(pattern, flags)
  File /export/home/ostkamp/local/lib/python2.5/re.py, line 231, in
_compile
p = sre_compile.compile(pattern, flags)
  File /export/home/ostkamp/local/lib/python2.5/sre_compile.py, line
530, in compile
groupindex, indexgroup
OverflowError: regular expression code size limit exceeded


This is apparently caused by some code in Modules/_sre.c and
Modules/sre.h as follows:

self-code[i] = (SRE_CODE) value;
if ((unsigned long) self-code[i] != value) {
PyErr_SetString(PyExc_OverflowError,
regular expression code size limit exceeded);
break;
}

An 'unsigned int' value is unnecessarily squeezed into an 'unsigned
short' field defined in sre.h:

#ifdef Py_UNICODE_WIDE
#define SRE_CODE Py_UCS4
#else
#define SRE_CODE unsigned short
#endif

On all systems I'm working on (SuSE Linux SLES 9, Solaris 8 etc.) the
else case of the ifdef applies which chooses 'unsigned short'.

I don't understand the relationship between 'unicode' and what is
apparently the size of the regular expression stack here.

Some experiments have shown that changing the 'unsigned short' to
'unsigned long' and rebuilding Python fixes the problem.

Here is a test program to reproduce the error:

#!/usr/bin/env python
import re, random, sys
def randhexstring():
return .join([%04x % random.randint(0, 0x) for x in range(20)])
pats = [randhexstring() for x in range(1000)]
def rematch(pats):
pat = '(?:%s)' % '|'.join(pats)
return re.compile(pat).match
m = rematch(pats)
count = 0
for s in pats * 100:
if m(s):
count += 1
print count



Regards

Guido

--
components: Regular Expressions
messages: 55885
nosy: ostkamp
severity: normal
status: open
title: Medium size regexp crashes python
type: crash
versions: Python 2.5

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1160
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com