regex multiple patterns in order

2014-01-20 Thread km
matches of CAA, followed by four matches of TCT followed by 2 matches of TA ? Well these patterns (CAA/TCT/TA) can occur any number of times and atleast once so I have to use + in the regex. Please let me know. Thanks! Regards, Krishna mohan -- https://mail.python.org/mailman/listinfo/python-list

Re: regex multiple patterns in order

2014-01-20 Thread Devin Jeanpierre
find only one instance of the CAA/TCT/TA in that order. How can I get 3 matches of CAA, followed by four matches of TCT followed by 2 matches of TA ? Well these patterns (CAA/TCT/TA) can occur any number of times and atleast once so I have to use + in the regex. You want to include

Re: regex multiple patterns in order

2014-01-20 Thread Chris Angelico
matches of TCT followed by 2 matches of TA ? Well these patterns (CAA/TCT/TA) can occur any number of times and atleast once so I have to use + in the regex. You're capturing the single instance, not the repeated one. It is matching against all three CAA units, but capturing just the first. Try

Re: regex multiple patterns in order

2014-01-20 Thread Ben Finney
km srikrishnamo...@gmail.com writes: I am trying to find sub sequence patterns but constrained by the order in which they occur There are also specific resources for understanding and testing regex patterns, such as URL:http://www.pythonregex.com/. For example p = re.compile('(CAA)+?(TCT

Re: regex multiple patterns in order

2014-01-20 Thread km
resources for understanding and testing regex patterns, such as URL:http://www.pythonregex.com/. For example p = re.compile('(CAA)+?(TCT)+?(TA)+?') p.findall('CAACAACAATCTTCTTCTTCTTATATA') [('CAA', 'TCT', 'TA')] But I instead find only one instance of the CAA/TCT/TA in that order

Re: regex multiple patterns in order

2014-01-20 Thread Roy Smith
(in this case, I want groups 0, 2, and 4). I also left off the outer ?s, because I think this better represents the intent. The pattern '((CAA)+)?((TCT)+)?((TA)+)?' matches, for example, an empty string; I suspect that's not what was intended. Be aware that regex is not the solution to all parsing

Re: regex multiple patterns in order

2014-01-20 Thread Neil Cerutti
On 2014-01-20, Roy Smith r...@panix.com wrote: In article mailman.5748.1390216721.18130.python-l...@python.org, Ben Finney ben+pyt...@benfinney.id.au wrote: Be aware that regex is not the solution to all parsing problems; for many parsing problems it is an attractive but inappropriate tool

Re: regex multiple patterns in order

2014-01-20 Thread Mark Lawrence
On 20/01/2014 16:04, Neil Cerutti wrote: On 2014-01-20, Roy Smith r...@panix.com wrote: In article mailman.5748.1390216721.18130.python-l...@python.org, Ben Finney ben+pyt...@benfinney.id.au wrote: Be aware that regex is not the solution to all parsing problems; for many parsing problems

Re: regex multiple patterns in order

2014-01-20 Thread Devin Jeanpierre
comfortable with it. You don't have to, there's always the new regex module that's been on pypi for years. Or are you saying that you'd like to use regex but other influences that are outside of your sphere of control prevent you from doing so? I don't see any way in which someone

Re: regex multiple patterns in order

2014-01-20 Thread Neil Cerutti
in Python I have to contend with the re module. I've never become comfortable with it. You don't have to, there's always the new regex module that's been on pypi for years. Or are you saying that you'd like to use regex but other influences that are outside of your sphere of control prevent you

Re: regex multiple patterns in order

2014-01-20 Thread Rustom Mody
have to contend with the re module. I've never become comfortable with it. You don't have to, there's always the new regex module that's been on pypi for years. Or are you saying that you'd like to use regex but other influences that are outside of your sphere of control prevent you from

Re: regex multiple patterns in order

2014-01-20 Thread Mark Lawrence
. But when I want to use them in Python I have to contend with the re module. I've never become comfortable with it. You don't have to, there's always the new regex module that's been on pypi for years. Or are you saying that you'd like to use regex but other influences that are outside of your sphere

Re: regex multiple patterns in order

2014-01-20 Thread Mark Lawrence
with gvim. But when I want to use them in Python I have to contend with the re module. I've never become comfortable with it. You don't have to, there's always the new regex module that's been on pypi for years. Or are you saying that you'd like to use regex but other influences that are outside

[issue20283] Wrong keyword parameter name in regex pattern methods

2014-01-17 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- keywords: +patch Added file: http://bugs.python.org/file33508/sre_pattern_string_keyword.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20283

[issue20283] Wrong keyword parameter name in regex pattern methods

2014-01-17 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: Removed file: http://bugs.python.org/file33508/sre_pattern_string_keyword.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20283 ___

[issue20283] Wrong keyword parameter name in regex pattern methods

2014-01-17 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file33509/sre_pattern_string_keyword.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20283 ___

[issue20283] Wrong keyword parameter name in regex pattern methods

2014-01-17 Thread Terry J. Reedy
Terry J. Reedy added the comment: How nasty. I agree that this is a code bug. Unfortunately in this case, the C code does keyword matching of arguments and 'corrects' the doc for anyone who tries 'string='. pat.search(string='xabc', pos=1) Traceback (most recent call last): File pyshell#6,

[issue20283] Wrong keyword parameter name in regex pattern methods

2014-01-16 Thread Serhiy Storchaka
New submission from Serhiy Storchaka: Documented (in docstring and in ReST documentation) signatures of the match, search and (since 3.4) fullmatch methods of regex pattern object are: match(string[, pos[, endpos]]) search(string[, pos[, endpos]]) fullmatch(string[, pos[, endpos]]) However

[issue20145] unittest.assert*Regex functions should verify that expected_regex has a valid type

2014-01-11 Thread Terry J. Reedy
Changes by Terry J. Reedy tjre...@udel.edu: -- nosy: +ezio.melotti, michael.foord stage: - test needed type: behavior - enhancement versions: -Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4 ___ Python tracker rep...@bugs.python.org

[issue20145] unittest.assert*Regex functions should verify that expected_regex has a valid type

2014-01-06 Thread the mulhern
New submission from the mulhern: A normal thing for a developer to do is to convert a use of an assert* function to a use of an assert*Regex function and foolishly forget to actually specify the expected regular expression. If they do this, the test will always pass because the callable

[issue2679] email.feedparser regex duplicate

2014-01-04 Thread moijes12
Changes by moijes12 moije...@gmail.com: -- nosy: -moijes12 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2679 ___ ___ Python-bugs-list mailing

Re: status of regex modules

2013-12-05 Thread Mark Lawrence
? Note that I've no direct interest as I rarely if ever use the little perishers, I just find this situation bizarre. It is definitely unfortunate and even embarrassing. At one time, the hangup was a minor feature incompatibility between re and regex. Guido was reluctant to make a switch that would

Re: status of regex modules

2013-12-04 Thread Mark Lawrence
On 24/10/2013 22:47, Mark Lawrence wrote: The new module is now five years old. PEP 429 Python 3.4 release schedule has it listed under Other proposed large-scale changes but I don't believe this is actually happening. Lots of issues on the bug tracker have been closed as fixed in the new

Re: status of regex modules

2013-12-04 Thread Terry Reedy
and even embarrassing. At one time, the hangup was a minor feature incompatibility between re and regex. Guido was reluctant to make a switch that would occasionally break code. I believe that this is fixed -- by deciding to call it regex rather then re. My impression from http

[issue13592] repr(regex) doesn't include actual regex

2013-11-25 Thread Roundup Robot
Roundup Robot added the comment: New changeset 4ba7a29fe02c by Ezio Melotti in branch 'default': #13592, #17087: add whatsnew entry about regex/match object repr improvements. http://hg.python.org/cpython/rev/4ba7a29fe02c -- ___ Python tracker rep

[issue13592] repr(regex) doesn't include actual regex

2013-11-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is fixed and simplified patch. -- Added file: http://bugs.python.org/file32806/issue13592_add_repr_to_regex_v3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13592

[issue13592] repr(regex) doesn't include actual regex

2013-11-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: * re.UNICODE omitted for string patterns. * Long patterns are truncated. -- Added file: http://bugs.python.org/file32807/issue13592_add_repr_to_regex_v4.patch ___ Python tracker rep...@bugs.python.org

[issue13592] repr(regex) doesn't include actual regex

2013-11-23 Thread Roundup Robot
Roundup Robot added the comment: New changeset 8c00677da6c0 by Serhiy Storchaka in branch 'default': Issue #13592: Improved the repr for regular expression pattern objects. http://hg.python.org/cpython/rev/8c00677da6c0 -- nosy: +python-dev ___ Python

[issue13592] repr(regex) doesn't include actual regex

2013-11-23 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13592

[issue13592] repr(regex) doesn't include actual regex

2013-11-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Thank you Hugo for your contribution. Thank you Thomas and Ezio for your reviews and suggestions. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13592

Re: splitting file/content into lines based on regex termination

2013-11-09 Thread Piet van Oostrum
bruce badoug...@gmail.com writes: hi. thanks for the reply. tried what you suggested. what I see now, is that I print out the lines, but not the regex data at all. my initial try, gave me the line, and then the next items , followed by the next line, etc... exp = re.compile(r(br#\d+\s*/\s

splitting file/content into lines based on regex termination

2013-11-07 Thread bruce
hi. got a test file with the sample content listed below: the content is one long string, and needs to be split into separate lines I'm thinking the pattern to split on should be a kind of regex like:: br#45 / 58#0# or br#9 / 58#0 but i have no idea how to make this happen!! if i read

Re: splitting file/content into lines based on regex termination

2013-11-07 Thread bruce
so i'd have the results of the compile/regex process to be added to the split lines thoughts/comments?? thanks On Thu, Nov 7, 2013 at 12:15 PM, bruce badoug...@gmail.com wrote: hi. got a test file with the sample content listed below: the content is one long string, and needs

Re: splitting file/content into lines based on regex termination

2013-11-07 Thread MRAB
#09:00ambr#09:50ambr#3718 HBLL 9 / 58,0 so i'd have the results of the compile/regex process to be added to the split lines thoughts/comments?? thanks The split method also returns what's matched in any capture groups, i.e. (\d+). Try omitting the parentheses: dat = re.compile(rbr#\d+ / \d

Re: splitting file/content into lines based on regex termination

2013-11-07 Thread bruce
hi. thanks for the reply. tried what you suggested. what I see now, is that I print out the lines, but not the regex data at all. my initial try, gave me the line, and then the next items , followed by the next line, etc... what I then tried, was to do a capture/findall of the regex

Re: Parsing multiple lines from text file using regex

2013-11-03 Thread Jason Friedman
Hi, I am having an issue with something that would seem to have an easy solution, but which escapes me. I have configuration files that I would like to parse. The data I am having issue with is a multi-line attribute that has the following structure: banner option banner text delimiter

RE: Parsing multiple lines from text file using regex

2013-11-03 Thread Marc
This is an alternative solution someone else posted on this list for a similar problem I had: #!/usr/bin/python3 from itertools import groupby def get_lines_from_file(file_name): with open(file_name) as reader: for line in

Re: Parsing multiple lines from text file using regex

2013-10-28 Thread Oscar Benjamin
On 28 October 2013 00:35, Marc m...@marcd.org wrote: What was wrong with the answer Peter Otten gave you earlier today on the tutor mailing list? -- Python is the second best programming language in the world. But the best has yet to be invented. Christian Tismer Mark Lawrence I did not

RE: Parsing multiple lines from text file using regex

2013-10-28 Thread Marc
Hi Marc, did you actually subscribe to the tutor list or did you just send an email there? Peter replied to you and you can see the reply here: https://mail.python.org/pipermail/tutor/2013-October/098156.html He only sent the reply back to the tutor list and didn't email it directly to you

RE: Parsing multiple lines from text file using regex

2013-10-28 Thread Marc
Hi Marc, did you actually subscribe to the tutor list or did you just send an email there? Peter replied to you and you can see the reply here: https://mail.python.org/pipermail/tutor/2013-October/098156.html He only sent the reply back to the tutor list and didn't email it directly to you

Parsing multiple lines from text file using regex

2013-10-27 Thread Marc
text Banner text Banner text ... banner text delimiter The regex 'banner\s+(\w+)\s+(.+)' captures the command nicely and banner.group(2) captures the delimiter nicely. My issue is that I need to capture the lines between the delimiters (both delimiters are the same). I have tried various

Re: Parsing multiple lines from text file using regex

2013-10-27 Thread Rhodri James
the following structure: banner option banner text delimiter Banner text Banner text Banner text ... banner text delimiter The regex 'banner\s+(\w+)\s+(.+)' captures the command nicely and banner.group(2) captures the delimiter nicely. My issue is that I need to capture the lines between

Re: Parsing multiple lines from text file using regex

2013-10-27 Thread Mark Lawrence
banner text delimiter Banner text Banner text Banner text ... banner text delimiter The regex 'banner\s+(\w+)\s+(.+)' captures the command nicely and banner.group(2) captures the delimiter nicely. My issue is that I need to capture the lines between the delimiters (both delimiters are the same

Re: Parsing multiple lines from text file using regex

2013-10-27 Thread Roy Smith
In article op.w5mwa3iaa8ncjz@gnudebeest, Rhodri James rho...@wildebst.demon.co.uk wrote: I really, really wouldn't do this with a single regexp. You'll get a much easier to understand program if you implement a small state machine instead. And what is a regex if not a small state

Re: Parsing multiple lines from text file using regex

2013-10-27 Thread Ben Finney
. And what is a regex if not a small state machine? Regex is not a state machine implemented by the original poster :-) Or, in other words, I interpret Rhodri as saying that the right way to do this is by implementing a *different* small state machine, which will address the task better than the small

RE: Parsing multiple lines from text file using regex

2013-10-27 Thread Marc
What was wrong with the answer Peter Otten gave you earlier today on the tutor mailing list? -- Python is the second best programming language in the world. But the best has yet to be invented. Christian Tismer Mark Lawrence I did not receive any answers from the Tutor list, so I thought I'd

Re: Parsing multiple lines from text file using regex

2013-10-27 Thread Mark Lawrence
from the Tutor list, so I thought I'd ask here. If an answer was posted to the Tutor list, it never made it to my inbox. Thanks to all that responded. Okay, the following is taken directly from Peter's reply to you. Please don't shoot the messenger :) You can reference a group in the regex

[issue13592] repr(regex) doesn't include actual regex

2013-10-27 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- assignee: - serhiy.storchaka nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13592 ___

[issue19408] Regex with set of characters and groups raises error

2013-10-26 Thread Isis Binder
New submission from Isis Binder: I was working on some SPOJ exercises when the regex module hit me with an error related to '*' being used inside the character set operator. I looked in the module docs but it says: Special characters lose their special meaning inside sets. For example

[issue19408] Regex with set of characters and groups raises error

2013-10-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: From re documentation: Ranges of characters can be indicated by giving two characters and separating them by a '-', for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59, and [0-9A-Fa-f]

[issue19408] Regex with set of characters and groups raises error

2013-10-26 Thread Matthew Barnett
Matthew Barnett added the comment: The traceback says bad character range because ord('+') == 43 and ord('*') == 42. It's not surprising that it complains if the range isn't valid. -- ___ Python tracker rep...@bugs.python.org

[issue19408] Regex with set of characters and groups raises error

2013-10-26 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19408 ___

status of regex modules

2013-10-24 Thread Mark Lawrence
The new module is now five years old. PEP 429 Python 3.4 release schedule has it listed under Other proposed large-scale changes but I don't believe this is actually happening. Lots of issues on the bug tracker have been closed as fixed in the new module, see issue 2636 for more data. Some

[issue19322] Python crashes on re.search in new regex module.

2013-10-20 Thread David
New submission from David: Python crashes while executing the following code using the new regex module. Have I made a mistake? import regex as re rx = re.compile(r'\bt(est){i2}', flags=re.V1) print Prints here rx.findall(Some text) # Python crashes print Fails to print I get the same results

[issue19322] Python crashes on re.search in new regex module.

2013-10-20 Thread Ned Deily
Ned Deily added the comment: The regex module is a third-party project and is not part of the Python standard library. I suggest you open an issue on the issue tracker for the project and include more detailed information about the problem: https://code.google.com/p/mrab-regex-hg/ https

[issue18951] In unittest.TestCase.assertRegex change re and regex to r

2013-09-13 Thread Terry J. Reedy
letter to be consistent with the rest of the table. I think the patch should be applied. * Using 're' and 'regex' to mean the same object *is* confusing. -- nosy: +terry.reedy stage: - commit review ___ Python tracker rep...@bugs.python.org http

[issue18951] In unittest.TestCase.assertRegex change re and regex to r

2013-09-13 Thread py.user
py.user added the comment: ok, I will repeat patch contents in message by words to avoid guessing -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18951 ___

[issue18951] In unittest.TestCase.assertRegex change re and regex to r

2013-09-13 Thread Ezio Melotti
Ezio Melotti added the comment: Fixed, thanks for the report and the patch! I also updated the table entry for assertRaisesRegex and assertWarnsRegex. I avoid using regex instead of re or r to keep it short and avoid confusion with the re/regex modules. The documentation of the methods uses

[issue18951] In unittest.TestCase.assertRegex change re and regex to r

2013-09-13 Thread Roundup Robot
Roundup Robot added the comment: New changeset 03e94f9884ce by Ezio Melotti in branch '3.3': #18951: use consistent names in unittest docs. http://hg.python.org/cpython/rev/03e94f9884ce New changeset eb332e3dc303 by Ezio Melotti in branch 'default': #18951: merge with 3.3.

[issue18951] In unittest.TestCase.assertRegex change re and regex to r

2013-09-06 Thread py.user
, py.user priority: normal severity: normal status: open title: In unittest.TestCase.assertRegex change re and regex to r type: enhancement versions: Python 3.3, Python 3.4 Added file: http://bugs.python.org/file31636/issue.diff ___ Python tracker rep...@bugs.python.org

[issue18832] New regex module degrades re performance

2013-08-25 Thread Tal Weiss
New submission from Tal Weiss: All tests I ran comparing timing of the new regex module relative to the old re module showed significant slower performance. I'm attaching test code with regular expressions from our production server. Tested on Python 2.7, 64 bit Linux + 64 bit Windows 7. regex

[issue18832] New regex module degrades re performance

2013-08-25 Thread Matthew Barnett
Matthew Barnett added the comment: The 'regex' module is not part of the CPython distribution, so it's not covered by this tracker. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18832

[issue18832] New regex module degrades re performance

2013-08-25 Thread Ned Deily
Changes by Ned Deily n...@acm.org: -- resolution: - invalid stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18832 ___

[issue18562] Regex howto: revision pass

2013-08-18 Thread Ezio Melotti
Ezio Melotti added the comment: #17441 also has a discussion about regex caching that might be relevant. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18562

[issue18562] Regex howto: revision pass

2013-08-18 Thread Roundup Robot
Roundup Robot added the comment: New changeset 366ca21600c9 by Andrew Kuchling in branch '3.3': #18562: various revisions to the regex howto for 3.x http://hg.python.org/cpython/rev/366ca21600c9 -- nosy: +python-dev ___ Python tracker rep

[issue18562] Regex howto: revision pass

2013-08-18 Thread A.M. Kuchling
Changes by A.M. Kuchling li...@amk.ca: -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18562 ___ ___

[issue18562] Regex howto: revision pass

2013-08-18 Thread A.M. Kuchling
Changes by A.M. Kuchling li...@amk.ca: -- stage: patch review - committed/rejected ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18562 ___ ___

[issue18562] Regex howto: revision pass

2013-08-17 Thread A.M. Kuchling
A.M. Kuchling added the comment: Slightly revised version that modifies the discussion of when to pre-compile a regex and when to not bother. I don't think this is a very important issue, so I don't think it needs a long discussion. -- Added file: http://bugs.python.org/file31348

[issue18562] Regex howto: revision pass

2013-08-17 Thread Antoine Pitrou
Antoine Pitrou added the comment: Well, this is already too long IMO. Two sentences should suffice. If you are calling a regex very often in a loop, then it makes sense to compile it. Otherwise, don't bother. -- nosy: +pitrou ___ Python tracker rep

[issue18562] Regex howto: revision pass

2013-08-06 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- components: +Regular Expressions nosy: +mrabarnett type: - enhancement ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18562 ___

[issue18562] Regex howto: revision pass

2013-07-26 Thread A.M. Kuchling
New submission from A.M. Kuchling: I read through the 3.3 regex howto and have made various edits in the attached patch. * describe how \w is different when used in bytes and Unicode patterns. * describe re.ASCII flag to change that behaviour. * remove a personal reference ('I generally

[issue18562] Regex howto: revision pass

2013-07-26 Thread Berker Peksag
Changes by Berker Peksag berker.pek...@gmail.com: -- nosy: +ezio.melotti versions: +Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18562 ___

[issue18155] csv.Sniffer.has_header doesn't escape characters used in regex

2013-06-29 Thread Roundup Robot
Roundup Robot added the comment: New changeset 68ff68f9a0d5 by R David Murray in branch '3.3': #18155: Regex-escape delimiter, in case it is a regex special char. http://hg.python.org/cpython/rev/68ff68f9a0d5 New changeset acaf73e3d882 by R David Murray in branch 'default': Merge #18155: Regex

[issue18155] csv.Sniffer.has_header doesn't escape characters used in regex

2013-06-29 Thread R. David Murray
R. David Murray added the comment: Committed, with slight modifications to the tests. Thanks Vajrasky. -- resolution: - fixed stage: needs patch - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org

Re: Why is regex so slow?

2013-06-19 Thread Johannes Bauer
On 18.06.2013 22:30, Grant Edwards wrote: All the O() tells you is the general shape of the line. Nitpick: it only gives an *upper bound* for the complexity. Any function that is within O(n) is also within O(n^2). Usually when people say O() they actually mean capital Thetha (which is the

Re: Why is regex so slow?

2013-06-19 Thread Duncan Booth
. Or as the source puts it: it's a mix between Boyer-Moore and Horspool, with a few more bells and whistles on the top. Also the regex library has to do a whole lot more than just figuring out if it got a match, so you have massively over-simplified it. -- Duncan Booth http://kupuguy.blogspot.com -- http

Re: Why is regex so slow?

2013-06-19 Thread Roy Smith
]); is essentially (well, sort-if) the same as the compile() step of a regex. For the (presumably) common use case of searching many strings for the same substring (which is what we're doing here), it seems like it would be a win to cache the mask and reuse it if the search string id is the same

Why is regex so slow?

2013-06-18 Thread Roy Smith
/(.*)') count = 0 for line in open('error.log'): m = pattern.search(line) if m: count += 1 print count -- If I add a pre-filter before the regex, it runs in 0.78 seconds (about twice the speed!) -- import re pattern = re.compile

Re: Why is regex so slow?

2013-06-18 Thread Skip Montanaro
I don't understand why the first way is so much slower. I have no obvious answers, but a couple suggestions: 1. Can you anchor the pattern at the beginning of the line? (use match() instead of search()) 2. Does it get faster it you eliminate the (.*) part of the pattern? It seems that if you

Re: Why is regex so slow?

2013-06-18 Thread Roy Smith
of creating a group. At this point, I'm not so much interested in making this faster as understanding why it's so slow. I'm tempted to open this up as a performance bug against the regex module (which I assume will be rejected, at least for the 2.x series). --- Roy Smith r...@panix.com -- http

Re: Why is regex so slow?

2013-06-18 Thread Chris Angelico
On Wed, Jun 19, 2013 at 3:08 AM, Roy Smith r...@panix.com wrote: I'm tempted to open this up as a performance bug against the regex module (which I assume will be rejected, at least for the 2.x series). Yeah, I'd try that against 3.3 before opening a performance bug. Also, it's entirely

Re: Why is regex so slow?

2013-06-18 Thread MRAB
= re.compile(r'ENQUEUEING: /listen/(.*)') count = 0 for line in open('error.log'): m = pattern.search(line) if m: count += 1 print count -- If I add a pre-filter before the regex, it runs in 0.78 seconds (about twice the speed

Re: Why is regex so slow?

2013-06-18 Thread Mark Lawrence
the line yourself instead of creating a group. At this point, I'm not so much interested in making this faster as understanding why it's so slow. I'm tempted to open this up as a performance bug against the regex module (which I assume will be rejected, at least for the 2.x series). --- Roy

Re: Why is regex so slow?

2013-06-18 Thread Johannes Bauer
On 18.06.2013 19:20, Chris Angelico wrote: Yeah, I'd try that against 3.3 before opening a performance bug. Also, it's entirely possible that performance is majorly different in 3.x anyway, on account of strings being Unicode. Definitely merits another look imho. Hmmm, at least Python 3.2

Re: Why is regex so slow?

2013-06-18 Thread Roy Smith
In article mailman.3549.1371576854.3114.python-l...@python.org, Mark Lawrence breamore...@yahoo.co.uk wrote: Out of curiousity have the tried the new regex module from pypi rather than the stdlib version? A heck of a lot of work has gone into it see http://bugs.python.org/issue2636 I just

Re: Why is regex so slow?

2013-06-18 Thread Rick Johnson
On Tuesday, June 18, 2013 11:45:29 AM UTC-5, Roy Smith wrote: I've got a 170 MB file I want to search for lines that look like: [2010-10-20 16:47:50.339229 -04:00] INFO (6): songza.amie.history - ENQUEUEING: /listen/the-station-one This code runs in 1.3 seconds:

Re: Why is regex so slow?

2013-06-18 Thread Roy Smith
On Tuesday, June 18, 2013 2:10:16 PM UTC-4, Johannes Bauer wrote: Resulting file has a size of 91530018 and md5 of 2d20c3447a0b51a37d28126b8348f6c5 (just to make sure we're on the same page because I'm not sure the PRNG is stable across Python versions). If people want to test against my

Re: Why is regex so slow?

2013-06-18 Thread MRAB
On 18/06/2013 20:21, Roy Smith wrote: In article mailman.3549.1371576854.3114.python-l...@python.org, Mark Lawrence breamore...@yahoo.co.uk wrote: Out of curiousity have the tried the new regex module from pypi rather than the stdlib version? A heck of a lot of work has gone into it see http

Re: Why is regex so slow?

2013-06-18 Thread André Malo
* Johannes Bauer wrote: The pre-check version is about 42% faster in my case (0.75 sec vs. 1.3 sec). Curious. This is Python 3.2.3 on Linux x86_64. A lot of time is spent with dict lookups (timings at my box, Python 3.2.3) in your inner loop (150 times...) #!/usr/bin/python3 import re

Re: Why is regex so slow?

2013-06-18 Thread Antoine Pitrou
Roy Smith roy at panix.com writes: Every line which contains 'ENQ' also matches the full regex (61425 lines match, out of 2.1 million total). I don't understand why the first way is so much slower. One invokes a fast special-purpose substring searching routine (the str.__contains__ operator

Re: Why is regex so slow?

2013-06-18 Thread André Malo
* André Malo wrote: * Johannes Bauer wrote: The pre-check version is about 42% faster in my case (0.75 sec vs. 1.3 sec). Curious. This is Python 3.2.3 on Linux x86_64. A lot of time is spent with dict lookups (timings at my box, Python 3.2.3) in your inner loop (150 times...) [...]

Re: Why is regex so slow?

2013-06-18 Thread Roy Smith
On Tuesday, June 18, 2013 4:05:25 PM UTC-4, Antoine Pitrou wrote: One invokes a fast special-purpose substring searching routine (the str.__contains__ operator), the other a generic matching engine able to process complex patterns. It's hardly a surprise for the specialized routine to be

Re: Why is regex so slow?

2013-06-18 Thread Grant Edwards
On 2013-06-18, Antoine Pitrou solip...@pitrou.net wrote: Roy Smith roy at panix.com writes: You should read again on the O(...) notation. It's an asymptotic complexity, it tells you nothing about the exact function values at different data points. So you can have two O(n) routines, one of

Re: Why is regex so slow?

2013-06-18 Thread Terry Reedy
On 6/18/2013 4:30 PM, Grant Edwards wrote: On 2013-06-18, Antoine Pitrou solip...@pitrou.net wrote: Roy Smith roy at panix.com writes: You should read again on the O(...) notation. It's an asymptotic complexity, it tells you nothing about the exact function values at different data points. So

Re: Why is regex so slow?

2013-06-18 Thread Steven D'Aprano
: -- import re pattern = re.compile(r'ENQUEUEING: /listen/(.*)') count = 0 for line in open('error.log'): m = pattern.search(line) if m: count += 1 print count -- If I add a pre-filter before the regex, it runs in 0.78 seconds (about twice

Re: Why is regex so slow?

2013-06-18 Thread Dave Angel
On 06/18/2013 09:51 PM, Steven D'Aprano wrote: SNIP Even if the regex engine is just as efficient at doing simple character matching as `in`, and it probably isn't, your regex tries to match all eleven characters of ENQUEUEING while the `in` test only has to match three, ENQ. The rest

Re: Why is regex so slow?

2013-06-18 Thread Steven D'Aprano
On Tue, 18 Jun 2013 22:11:01 -0400, Dave Angel wrote: On 06/18/2013 09:51 PM, Steven D'Aprano wrote: SNIP Even if the regex engine is just as efficient at doing simple character matching as `in`, and it probably isn't, your regex tries to match all eleven characters of ENQUEUEING

[issue18155] csv.Sniffer.has_header doesn't escape characters used in regex

2013-06-14 Thread A.M. Kuchling
Changes by A.M. Kuchling li...@amk.ca: -- nosy: +akuchling ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18155 ___ ___ Python-bugs-list mailing

[issue18155] csv.Sniffer.has_header doesn't escape characters used in regex

2013-06-07 Thread Dave Challis
New submission from Dave Challis: When attempting to detect the presence of CSV headers, delimiters are passed to a regex function without escaping, which causes an exception if a delimiter which has meaning in a regex (e.g. '+', '*' etc.) is used. Code to reproduce: import csv s

[issue18155] csv.Sniffer.has_header doesn't escape characters used in regex

2013-06-07 Thread R. David Murray
R. David Murray added the comment: I doubt this is a regression, so I'm marking the others versions as well without actually testing it. Should be an easy fix. -- keywords: +easy nosy: +r.david.murray stage: - needs patch versions: +Python 2.7, Python 3.4

<    4   5   6   7   8   9   10   11   12   13   >