Re: Regex Python Help
On Tue, Mar 24, 2015 at 1:13 PM, gdot...@gmail.com wrote: SyntaxError: Missing parentheses in call to 'print' It appears you are attempting to use a Python 2.x print statement with Python 3.x Try changing the last line to print(line.rstrip()) Skip -- https://mail.python.org/mailman/listinfo/python-list
[issue2636] Adding a new regex module (compatible with re)
Changes by Evgeny Kapun abacabadabac...@gmail.com: -- nosy: +abacabadabacaba ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: regex help
Larry Martell wrote: I need to remove all trailing zeros to the right of the decimal point, but leave one zero if it's whole number. For example, if I have this: 14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196 I want to end up with: 14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196 I have a regex to remove the zeros: '0+[,$]', '' But I can't figure out how to get the 5. to be 5.0. I've been messing with the negative lookbehind, but I haven't found one that works for this. First of all, I find it unlikely that you really want to solve your problem with regular expressions. Google “X-Y problem”. Second, if you must use regular expressions, the most simple approach is to use backreferences. Third, you need to show the relevant (Python) code. http://www.catb.org/~esr/faqs/smart-questions.html -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail. -- https://mail.python.org/mailman/listinfo/python-list
Re: regex help
On 2015-03-13 12:05, Larry Martell wrote: I need to remove all trailing zeros to the right of the decimal point, but leave one zero if it's whole number. But I can't figure out how to get the 5. to be 5.0. I've been messing with the negative lookbehind, but I haven't found one that works for this. You can do it with string-ops, or you can resort to regexp. Personally, I like the clarity of the string-ops version, but use what suits you. -tkc import re input = [ '14S', '5.', '4.5686274500', '3.7272727272727271', '3.3947368421052630', '5.7307692307692308', '5.7547169811320753', '4.9423076923076925', '5.7884615384615383', '5.13725490196', ] output = [ '14S', '5.0', '4.56862745', '3.7272727272727271', '3.394736842105263', '5.7307692307692308', '5.7547169811320753', '4.9423076923076925', '5.7884615384615383', '5.13725490196', ] def fn1(s): if '.' in s: s = s.rstrip('0') if s.endswith('.'): s += '0' return s def fn2(s): return re.sub(r'(\.\d+?)0+$', r'\1', s) for fn in (fn1, fn2): for i, o in zip(input, output): v = fn(i) print %s: %s - %s [%s] % (v == o, i, v, o) -- https://mail.python.org/mailman/listinfo/python-list
Re: regex help
On 2015-03-13 16:05, Larry Martell wrote: I need to remove all trailing zeros to the right of the decimal point, but leave one zero if it's whole number. For example, if I have this: 14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196 I want to end up with: 14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196 I have a regex to remove the zeros: '0+[,$]', '' But I can't figure out how to get the 5. to be 5.0. I've been messing with the negative lookbehind, but I haven't found one that works for this. Search: (\.\d+?)0+\b Replace: \1 which is: re.sub(r'(\.\d+?)0+\b', r'\1', string) -- https://mail.python.org/mailman/listinfo/python-list
Re: regex help
On Fri, Mar 13, 2015 at 1:29 PM, MRAB pyt...@mrabarnett.plus.com wrote: On 2015-03-13 16:05, Larry Martell wrote: I need to remove all trailing zeros to the right of the decimal point, but leave one zero if it's whole number. For example, if I have this: 14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196 I want to end up with: 14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196 I have a regex to remove the zeros: '0+[,$]', '' But I can't figure out how to get the 5. to be 5.0. I've been messing with the negative lookbehind, but I haven't found one that works for this. Search: (\.\d+?)0+\b Replace: \1 which is: re.sub(r'(\.\d+?)0+\b', r'\1', string) Thanks! That works perfectly. -- https://mail.python.org/mailman/listinfo/python-list
Re: regex help
On 13Mar2015 12:05, Larry Martell larry.mart...@gmail.com wrote: I need to remove all trailing zeros to the right of the decimal point, but leave one zero if it's whole number. For example, if I have this: 14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196 I want to end up with: 14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196 I have a regex to remove the zeros: '0+[,$]', '' But I can't figure out how to get the 5. to be 5.0. I've been messing with the negative lookbehind, but I haven't found one that works for this. Leaving aside the suggested non-greedy match, you can rephrase this: strip trailing zeroes _after_ the first decimal digit. Then you can consider a number to be: digits point any digit other digits to be right-zero stripped so: (\d+\.\d)(\d*[1-9])?0*\b and keep .group(1) and .group(2) from the match. Another way of considering the problem. Or you could two step it. Strip all trailing zeroes. If the result ends in a dot, add a single zero. Cheers, Cameron Simpson c...@zip.com.au C'mon. Take the plunge. By the time you go through rehab the first time, you'll be surrounded by the most interesting people, and if it takes years off of your life, don't sweat it. They'll be the last ones anyway. - Vinnie Jordan, alt.peeves -- https://mail.python.org/mailman/listinfo/python-list
Re: regex help
Larry Martell wrote: I need to remove all trailing zeros to the right of the decimal point, but leave one zero if it's whole number. def strip_zero(s): if '.' not in s: return s s = s.rstrip('0') if s.endswith('.'): s += '0' return s And in use: py strip_zero('-10.2500') '-10.25' py strip_zero('123000') '123000' py strip_zero('123000.') '123000.0' It doesn't support exponential format: py strip_zero('1.230e3') '1.230e3' because it isn't clear what you intend to do under those circumstances. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
regex help
I need to remove all trailing zeros to the right of the decimal point, but leave one zero if it's whole number. For example, if I have this: 14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196 I want to end up with: 14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196 I have a regex to remove the zeros: '0+[,$]', '' But I can't figure out how to get the 5. to be 5.0. I've been messing with the negative lookbehind, but I haven't found one that works for this. -- https://mail.python.org/mailman/listinfo/python-list
[issue22364] Improve some re error messages using regex for hints
Serhiy Storchaka added the comment: Could anyone please make a review? This patch is a prerequisite of other patches. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23532] add example of 'first match wins' to regex | documentation?
Matthew Barnett added the comment: Not quite all. POSIX regexes will always look for the longest match, so the order of the alternatives doesn't matter, i.e. x|xy would give the same result as xy|x. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23532 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23532] regex | behavior differs from documentation
Changes by Rick Otten rottenwindf...@gmail.com: -- components: Regular Expressions nosy: Rick Otten, ezio.melotti, mrabarnett priority: normal severity: normal status: open title: regex | behavior differs from documentation type: behavior versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23532 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23532] regex | behavior differs from documentation
Mark Shannon added the comment: This looks like the expected behaviour to me. re.sub matches the leftmost occurence and the regular expression is greedy so (x|xy) will always match xy if it can. -- nosy: +Mark.Shannon ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23532 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23532] regex | behavior differs from documentation
Rick Otten added the comment: Can the documentation be updated to make this more clear? I see now where the clause As the target string is scanned, ... is describing what you have listed here. I and a coworker both read the description several times and missed that. I thought it first tried incorporated against the whole string, then tried inc against the whole string, etc... When actually it was trying each, incorporated and inc and the others against the first position of the string. And then again for the second position. Since I want to force the order against the whole string before trying the next one for my particular use case, I'll do a series of re.subs instead of trying to do them all in one. It makes sense now and is easy to fix. Thanks for looking at it and explaining what is happening more clearly. It was really not obvious. I tried at least 100 variations and wasn't seeing the pattern. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23532 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23532] regex | behavior differs from documentation
Matthew Barnett added the comment: @Mark is correct, it's not a bug. In the first example: It tries to match each alternative at position 0. Failure. It tries to match each alternative at position 1. Failure. It tries to match each alternative at position 2. Failure. It tries to match each alternative at position 3. Success. ' inc' matches. In the second example: It tries to match each alternative at position 0. Failure. It tries to match each alternative at position 1. Failure. It tries to match each alternative at position 2. Failure. It tries to match each alternative at position 3. Failure. It tries to match each alternative at position 4. Success. 'incorporated' matches. ('inc' is a later alternative; it's considered only if the earlier alternatives have failed to match at that position.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23532 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23532] regex | behavior differs from documentation
New submission from Rick Otten: The documentation states that | parsing goes from left to right. This doesn't seem to be true when spaces are involved. (or \s). Example: In [40]: mystring Out[40]: 'rwo incorporated' In [41]: re.sub('incorporated| inc|llc|corporation|corp| co', '', mystring) Out[41]: 'rwoorporated' In this case inc was processed before incorporated. If I take the space out: In [42]: re.sub('incorporated|inc|llc|corporation|corp| co', '', mystring) Out[42]: 'rwo ' incorporated is processed first. If I put a space with each, then incorporated is processed first: In [43]: re.sub(' incorporated| inc|llc|corporation|corp| co', '', mystring) Out[43]: 'rwo' And If use \s instead of a space, it is processed first: In [44]: re.sub('incorporated|\sinc|llc|corporation|corp| co', '', mystring) Out[44]: 'rwoorporated' -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23532 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23532] add example of 'first match wins' to regex | documentation?
R. David Murray added the comment: The thing is, what you describe is fundamental to how regular expressions work. I'm not sure it makes sense to add a specific mention of it to the '|' docs, since it applies to all regexes. -- assignee: - docs@python components: +Documentation -Regular Expressions nosy: +docs@python, r.david.murray title: regex | behavior differs from documentation - add example of 'first match wins' to regex | documentation? ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23532 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22364] Improve some re error messages using regex for hints
Serhiy Storchaka added the comment: Messages tend to be abbreviated, so I think that it would be better to just omit the article. I agree, but this is came from standard error messages which are not consistent. I opened a thread on Python-Dev. expected a bytes-like object and expected str instance are standard error messages raised in bytes.join and str.join, not in re. We could change them though. I don't think that the error message bad repeat interval is an improvement (Why is it bad? What is an interval?). I think that saying that the min is greater than the max is clearer. Agree. I'll change this in re. What message is better in case of overflow: the repetition number is too large (in re) or repeat count too big (in regex)? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22364] Improve some re error messages using regex for hints
Serhiy Storchaka added the comment: Here is a patch for regex which makes some error messages be the same as in re with re_errors_2.patch. You could apply it to regex if new error messages look better than old error messages. Otherwise we could change re error messages to match regex, or discuss better variants. -- Added file: http://bugs.python.org/file38171/regex_errors.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22364] Improve some re error messages using regex for hints
Matthew Barnett added the comment: Some error messages use the indefinite article: expected a bytes-like object, %.200s found cannot use a bytes pattern on a string-like object cannot use a string pattern on a bytes-like object but others don't: expected string instance, %.200s found expected str instance, %.200s found Messages tend to be abbreviated, so I think that it would be better to just omit the article. I don't think that the error message bad repeat interval is an improvement (Why is it bad? What is an interval?). I think that saying that the min is greater than the max is clearer. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22364] Improve some re error messages using regex for hints
Serhiy Storchaka added the comment: Updated patch addresses Ezio's comments. -- Added file: http://bugs.python.org/file38080/re_errors_2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22364] Improve some re error messages using regex for hints
Serhiy Storchaka added the comment: Here is a patch which unify and improves re error messages. Added tests for all parsing errors. Now error message always points on the start of affected component, i.e. on the start of bad escape, group name or unterminated subpattern. -- stage: needs patch - patch review Added file: http://bugs.python.org/file38035/re_errors.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22364] Improve some re error messages using regex for hints
Serhiy Storchaka added the comment: re_errors_diff.txt contains differences for all tested error messages. -- Added file: http://bugs.python.org/file38036/re_errors_diff.txt ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23191] fnmatch regex cache use is not threadsafe
Changes by Serhiy Storchaka storch...@gmail.com: -- resolution: - fixed stage: patch review - resolved status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23191 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23191] fnmatch regex cache use is not threadsafe
Roundup Robot added the comment: New changeset fe12c34c39eb by Serhiy Storchaka in branch '2.7': Issue #23191: fnmatch functions that use caching are now threadsafe. https://hg.python.org/cpython/rev/fe12c34c39eb -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23191 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23191] fnmatch regex cache use is not threadsafe
Changes by Serhiy Storchaka storch...@gmail.com: -- assignee: - serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23191 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23318] (compiled RegEx).split gives unexpected results if () in pattern
New submission from Dave Notman: # Python 3.3.1 (default, Sep 25 2013, 19:30:50) # Linux 3.8.0-35-generic #50-Ubuntu SMP Tue Dec 3 01:25:33 UTC 2013 i686 i686 i686 GNU/Linux import re splitter = re.compile( r'(\s*[+/;,]\s*)|(\s+and\s+)' ) ll = splitter.split( 'Dave Sam, Jane and Zoe' ) print(repr(ll)) print( 'Try again with revised RegEx' ) splitter = re.compile( r'(?:(?:\s*[+/;,]\s*)|(?:\s+and\s+))' ) ll = splitter.split( 'Dave Sam, Jane and Zoe' ) print(repr(ll)) Results: ['Dave', ' ', None, 'Sam', ', ', None, 'Jane', None, ' and ', 'Zoe'] Try again with revised RegEx ['Dave', 'Sam', 'Jane', 'Zoe'] -- components: Regular Expressions messages: 234677 nosy: dnotmanj, ezio.melotti, mrabarnett priority: normal severity: normal status: open title: (compiled RegEx).split gives unexpected results if () in pattern type: behavior versions: Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23318 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23318] (compiled RegEx).split gives unexpected results if () in pattern
SilentGhost added the comment: Looks like it works exactly as the docs[1] describe: re.split(r'\s*[+/;,]\s*|\s+and\s+', string) ['Dave', 'Sam', 'Jane', 'Zoe'] You're using capturing groups (parentheses) in your original regex which returns separators as part of a match. [1] https://docs.python.org/3/library/re.html#re.split -- nosy: +SilentGhost resolution: - not a bug status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23318 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Python 3 regex?
Thomas 'PointedEars' Lahn wrote: wxjmfa...@gmail.com wrote: [...] And, why not? compare Py3.2 and Py3.3+ ! What are you getting at? Don't waste your time with JMF. He is obsessed with a trivial performance regression in Python 3.3. Unicode strings can be slightly more expensive to create in Python 3.3 compared to earlier versions, due to a clever memory optimization which saves up to 50% if your strings are all in the Basic Multilingual Plane and up to 75% if they are all in Latin-1. Never mind that for real-world code, that memory saving can often lead to applications running faster, JMF is obsessed with an artificial benchmark of his own devising that involves making, and throwing away, thousands of Unicode strings as fast as possible in such a way as to exercise the worst-case of the new Unicode model. From this unimportant performance regression, he has convinced himself that this means that Python 3.3 and beyond is logically and mathematically in violation of the Unicode standard. Any time JMF mentions anything to do with Python versions or Unicode or ASCII or French, he is in full-blown pi equals 3 exactly crank territory and is best ignored. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Tuesday, January 13, 2015 at 10:06:50 AM UTC+5:30, Steven D'Aprano wrote: On Mon, 12 Jan 2015 19:48:18 +, Ian wrote: My recommendation would be to write a recursive decent parser for your files. That way will be easier to write, I know that writing parsers is a solved problem in computer science, and that doing so is allegedly one of the more trivial things computer scientists are supposed to be able to do, but the learning curve to write parsers is if anything even higher than the learning curve to write a regex. I wish that Python made it as easy to use EBNF to write a parser as it makes to use a regex :-( http://en.wikipedia.org/wiki/Extended_Backus-Naur_Form -- Steven There appears to be at least one python package for this https://pypi.python.org/pypi/iscconf And for those wanting to use regexes to parse CFGs, the requried reading is: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Wed, 14 Jan 2015 14:02:27 +0100, Thomas 'PointedEars' Lahn wrote: wxjmfa...@gmail.com wrote: Le mardi 13 janvier 2015 03:53:43 UTC+1, Rick Johnson a écrit : [...] you should find Python's text processing Nirvana [...] I recommend, you write a small application I recommend you get a real name and do not post using the troll and spam- infested Google Groups, but a newsreader instead. sorting strings composed of latin characters, a sort based on diacritical characters I do not think you need regular expressions for that: you can use Unicode collations. (and eventually, taking into account linguistic specific aspects). BTDT. For a translator application, I used Python to sort a dictionary of the Latin phonetic transcription of Golic Vulcan whose alphabet is “S T P K R L A Sh O U D V Kh E H G Ch I N Zh M Y F Z Th W B” [1]. re helped a lot with that because inversely sorting the list by character length and turning it into an alternation allowed me to easily find the characters in words, and assign numbers to the letters so that I could sort the words according to this alphabet. If anyone is interested, I can post the relevant code. And, why not? compare Py3.2 and Py3.3+ ! What are you getting at? [1] http://home.comcast.net/~markg61/vlif.htm Do not wast you/our time with JMF he is a resident troll but unlike some of our resident trolls I don't think he has ever contributed anything useful -- An actor's a guy who if you ain't talkin' about him, ain't listening. -- Marlon Brando -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Tuesday, January 13, 2015 at 10:06:50 AM UTC+5:30, Steven D'Aprano wrote: On Mon, 12 Jan 2015 19:48:18 +, Ian wrote: My recommendation would be to write a recursive decent parser for your files. That way will be easier to write, I know that writing parsers is a solved problem in computer science, and that doing so is allegedly one of the more trivial things computer scientists are supposed to be able to do, Solved-CS-problem often is showing that the problem is unsolvable :-) http://blog.reverberate.org/2013/08/parsing-c-is-literally-undecidable.html -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
wxjmfa...@gmail.com wrote: Le mardi 13 janvier 2015 03:53:43 UTC+1, Rick Johnson a écrit : [...] you should find Python's text processing Nirvana [...] I recommend, you write a small application I recommend you get a real name and do not post using the troll and spam- infested Google Groups, but a newsreader instead. sorting strings composed of latin characters, a sort based on diacritical characters I do not think you need regular expressions for that: you can use Unicode collations. (and eventually, taking into account linguistic specific aspects). BTDT. For a translator application, I used Python to sort a dictionary of the Latin phonetic transcription of Golic Vulcan whose alphabet is “S T P K R L A Sh O U D V Kh E H G Ch I N Zh M Y F Z Th W B” [1]. re helped a lot with that because inversely sorting the list by character length and turning it into an alternation allowed me to easily find the characters in words, and assign numbers to the letters so that I could sort the words according to this alphabet. If anyone is interested, I can post the relevant code. And, why not? compare Py3.2 and Py3.3+ ! What are you getting at? [1] http://home.comcast.net/~markg61/vlif.htm -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Tuesday, January 13, 2015 at 11:09:17 AM UTC-6, Rick Johnson wrote: [...] DO YOU NEED ME TO DRAW YOU A PICTURE? I don't normally do this, but in the interest of education i feel i must bear the burdens for which all professional educators like myself are responsible. https://plus.google.com/114883720122692827712/posts/Nxo3rR7TwQS -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Tuesday, January 13, 2015 at 12:39:55 AM UTC-6, Steven D'Aprano wrote: On Mon, 12 Jan 2015 15:47:08 -0800, Rick Johnson wrote: [...] [...] #Ironic Twist (Reformatted)# # Some diabetics, when confronted with hunger, think I# # know, I'll eat a box of sugar cookies. -- now they have # # # # two problems!' # Not the best of analogies, since there are two forms of diabetes. Those with Type 2 diabetes can best manage their illness by avoiding sugar cookies. Those with Type 1 should keep a box of sugar cookies (well, perhaps glucose lollies are more appropriate) on hand for emergencies. You seem to misunderstand the basic distinction between type1 and type2 diabetes, it's not a mere dichotomy between hyperglycemia and hypoglycemia that defines a diabetes diagnosis, NO, Type1 can be simplified as insulin deficiency and Type2 as insulin resistance -- with both resulting in the inability of glucose (aka: fuel) to nourish the cells. YOUR ASSESSMENT OF MY ANALOGY IS JUST AS WEAK. Both my and Jamie's analogy present an example of the cruel irony. The only *DIFFERENCE* is that mine utilizes a subject matter which requires less study to understand. One can learn enough about diabetes to draw his own factual conclusions of my statement from a simple Google search, however, for regexps, a neophyte would need days, weeks, or even months of serious study to drawn sensible conclusions of merit. In any case, most people with diabetes (or at least those who are still alive) are reasonably good at managing their illness and wouldn't make the choice you suggest. You have missed the point that people who misuse regexes are common in programming circles, while diabetics who eat a box of sugar cookies instead of a meal are rare. I believe you could find many diabetics who've eaten poorly and suffered from the result -- even died! I'm not missing the point, you are! HECK, *I'M* THE ONE WHO *DEFINED* THE POINT. To take your analogy to an extreme: Some people, when faced with a problem, say I know, I'll cut my arm off with a pocketknife! Now they have two problems. This is not insightful or useful. Except in the most specialized and extreme circumstances, such as being trapped in the wilderness with a boulder on your arm, nobody would consider this to be good advice. I'm not giving *advice*, i'm merely drawing parallels. I think your repeated failures to understand me are are a result of your superficiality. When reading my posts, you need to learn to: read between the lines. Many of the writings i author are implicit philosophical statements, musings, and/or explorations. For me, everything has deeper meanings, just begging to be *plundered*! But using regexes to validate email addresses or parse HTML? The internet is full of people who thought that was a good idea. Again, i did not suggested that people have never done anything stupid with regexps, on the contrary, this list has bear witness to many of them. My only intention was to point out the damaging (albeit interesting) effects of propaganda. MY WHOLE POINT IS ABOUT PROPAGANDA! THAT'S IT! DO YOU NEED ME TO DRAW YOU A PICTURE? -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
alister alister.nospam.w...@ntlworld.com writes: On Tue, 13 Jan 2015 04:36:38 +, Steven D'Aprano wrote: On Mon, 12 Jan 2015 19:48:18 +, Ian wrote: My recommendation would be to write a recursive decent parser for your files. That way will be easier to write, I know that writing parsers is a solved problem in computer science, and that doing so is allegedly one of the more trivial things computer scientists are supposed to be able to do, but the learning curve to write parsers is if anything even higher than the learning curve to write a regex. I wish that Python made it as easy to use EBNF to write a parser as it makes to use a regex :-( http://en.wikipedia.org/wiki/Extended_Backus–Naur_Form I would not say that writing parsers is a solved problem. there may be solutions for a number of specific cases but many cases still cause difficulty, as an example I do not think there is a 100% complete parser for English (even native English speakers don't always get it) There is no complete characterization of English as a set of character strings, nor will there ever be. Linguists have a slogan for this: All Grammars Leak. (They used to write formal grammars to characterize all and only the well-formed sentences of a language, or to capture necessary and sufficient conditions, and those grammars turned out to both over-generate and under-generate.) Ambiguity doesn't help. In practice, it's not enough to find a parse. One wants a contextually appropriate parse. Sometimes this requires genuine understanding and knowledge. Also in practice, one may not be in the business of rejecting ill-formed sentences: one wants to make partial sense of even those. So, no, never 100 percent complete or 100 percent correct :) The solved problem is the unambiguous parsing of formal languages that are defined by a formal grammar to begin with, like the configuration file format at hand. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Tue, 13 Jan 2015 04:36:38 +, Steven D'Aprano wrote: On Mon, 12 Jan 2015 19:48:18 +, Ian wrote: My recommendation would be to write a recursive decent parser for your files. That way will be easier to write, I know that writing parsers is a solved problem in computer science, and that doing so is allegedly one of the more trivial things computer scientists are supposed to be able to do, but the learning curve to write parsers is if anything even higher than the learning curve to write a regex. I wish that Python made it as easy to use EBNF to write a parser as it makes to use a regex :-( http://en.wikipedia.org/wiki/Extended_Backus–Naur_Form I would not say that writing parsers is a solved problem. there may be solutions for a number of specific cases but many cases still cause difficulty, as an example I do not think there is a 100% complete parser for English (even native English speakers don't always get it) -- Keep the number of passes in a compiler to a minimum. -- D. Gries -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex woes (parsing ISC DHCPD config)
Jason Bailey wrote: My script first reads the DHCPD configuration file into memory - variable filebody. It then utilizes the re module to find the configuration details for the wanted shared network. The config file might look something like this: ## shared-network My-Network-MOHE { subnet 192.168.0.0 netmask 255.255.248.0 { option routers 192.168.0.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.0.20 192.168.7.254; } } } shared-network My-Network-CDCO { subnet 192.168.8.0 netmask 255.255.248.0 { option routers 10.101.8.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.8.20 192.168.15.254; } } } shared-network My-Network-FECO { subnet 192.168.16.0 netmask 255.255.248.0 { option routers 192.168.16.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.16.20 192.168.23.254; } } } ## Suppose I'm trying to grab the shared network called My-Network-FECO from the above config file stored in the variable 'filebody'. First I have my variable 'shared_network' which contains the string My-Network-FECO. I compile my regex: m = re.compile(r^(shared\-network ( + re.escape(shared_network) + r) \{((\n|.|\r\n)*?)(^\})), re.MULTILINE|re.UNICODE) This code does not run as posted. Applying Occam’s Razor, I think you meant to post m = re.compile(r^(shared\-network ( + re.escape(shared_network) + r) \{((\n|.|\r\n)*?)(^\})), re.MULTILINE|re.UNICODE) (If you post long lines, know where your automatic word wrap happens.) I search for regex matches in my config file: m.search(filebody) I find using the identifier “m” for the expression very strange. Usually I reserve “m” to hold the *matches* for an expression on a string. Consider “r” or “rx” or something else instead of “m” for the expression. Unfortunately, I get no matches. From output on the command line, I can see that Python is adding extra backslashes to my re.compile string. I have added the raw 'r' in front of the strings to prevent it, but to no avail. Python is adding the extra backslashes because you used “r”. Note that the console-printed string representations of strings do not have an “r” in front of them. What you see is what you would have needed to write for equivalent code if you had not used “r”. (Different from some other languages, Python does not distinguish between single-quoted and double- quoted strings with regard to parsing. Hence the r'…' feature, the triple- quoted string, and the .format() method.) You get no matches because you have escaped the HYPHEN-MINUSes (“-”). You never need to escape those characters, in fact you must not do that here because r'\-' is not an (unnecessarily) escaped HYPHEN-MINUS, it is a literal backslash followed by a HYPHEN-MINUS, a character sequence that does not occur in your string. Outside of a character class you do not need to do that, and in a character class you can put it as first or last character instead (“[-…]” or “[…-]”). You have escaped the first HYPHEN-MINUS; re.escape() has escaped the other two for you: | re.escape('-') | '\\-' I presume this behavior is because of character classes, and the idea that the return value should work at any position in a character class. ISTM that you cannot use re.escape() here, and you must escape special characters yourself (using re.sub()), should they be possible in the file. I do not see a reason for making the entire expression a group (but for making the network name a group). You should refrain from parsing non-regular languages with a *single* regular expression (multiple expressions or expressions with alternation in a loop are usually fine; this can be used for building efficient parsers), even though Python’s regular expressions, which are not an exception there, are not exactly “regular” in the theoretical computer science sense. See the Chomsky hierarchy and Jeffrey E. F. Friedl’s insightful textbook “Mastering Regular Expressions”. It is possible that there is a Python module for parsing ISC dhcpd configuration files already. If so, you should use that instead. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex woes (parsing ISC DHCPD config)
Thomas 'PointedEars' Lahn wrote: Jason Bailey wrote: shared-network My-Network-MOHE { […] { I compile my regex: m = re.compile(r^(shared\-network ( + re.escape(shared_network) + r) \{((\n|.|\r\n)*?)(^\})), re.MULTILINE|re.UNICODE) This code does not run as posted. Applying Occam’s Razor, I think you meant to post m = re.compile(r^(shared\-network ( + re.escape(shared_network) + r) \{((\n|.|\r\n)*?)(^\})), re.MULTILINE|re.UNICODE) […] You get no matches because you have escaped the HYPHEN-MINUSes (“-”). You never need to escape those characters, in fact you must not do that here because r'\-' is not an (unnecessarily) escaped HYPHEN-MINUS, it is a literal backslash followed by a HYPHEN-MINUS, a character sequence that does not occur in your string. Outside of a character class you do not need to do that, and in a character class you can put it as first or last character instead (“[-…]” or “[…-]”). You have escaped the first HYPHEN-MINUS; re.escape() has escaped the other two for you: | re.escape('-') | '\\-' I presume this behavior is because of character classes, and the idea that the return value should work at any position in a character class. It would appear that while my answer is not entirely wrong, the first sentence of that section is. You may escape the HYPHEN-MINUS there, and may use re.escape(); it has no effect on the expression because of what I said following that sentence. One must consider that the string is first parsed by Python’s string parser and then by Python’s re parser. So I have presently no specific idea why you get no matches, however r'\{((\n|.|\r\n)*?)(^\}' is not a proper way to match matching braces and everything in-between. To begin with, the proper expression to match any newline is r'(\r?\n|\r)' because the first matching alternative in an alternation, not the longest one, wins. But if you specify re.DOTALL, you can simply use “.” for any character (including any newline combination). […] You should refrain from parsing non-regular languages with a *single* regular expression (multiple expressions or expressions with alternation in a loop are usually fine; this can be used for building efficient parsers), even though Python’s regular expressions, which are not an exception there, are not exactly “regular” in the theoretical computer science sense. See the Chomsky hierarchy and Jeffrey E. F. Friedl’s insightful textbook “Mastering Regular Expressions”. And for matching matching braces (sic!) with regular expressions, you need a recursive one (which is another extension of regular expressions as they are discussed in CS). Or a parser in the first place. Otherwise you match too much with greedy matching { { } } { { } } ^-^ or too little with non-greedy matching { { } } { { } } ^---^ CS regular expressions can be used to describe *regular* languages (Chomsky- type 3). Bracket languages are, in general, not regular (see “pumping lemma for regular languages”), so for them you need an PDA¹-like extension of CS regular expressions (the aforementioned recursive ones), or a PDA implementation in the first place. Such a PDA implementation is part of a parser. ¹ https://en.wikipedia.org/wiki/Pushdown_automaton -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Monday, January 12, 2015 at 11:34:57 PM UTC-6, Mark Lawrence wrote: You snipped the bit that says [normal cobblers snipped]. Oh my, where are my *manners* today? Tell you what, next time when your sneaking up behind me with a knife in hand, do be a friend and tap me on the shoulder first, so i can take the knife and stab *myself* in the back! Your pal, Rick. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On 12/01/2015 18:03, Jason Bailey wrote: Hi all, I'm working on a Python _3_ project that will be used to parse ISC DHCPD configuration files for statistics and alarming purposes (IP address pools, etc). Anyway, I'm hung up on this one section and was hoping someone could provide me with some insight. My script first reads the DHCPD configuration file into memory - variable filebody. It then utilizes the re module to find the configuration details for the wanted shared network. Hi Jason, If you actually look at the syntax of what you are parsing, it is very simple. My recommendation would be to write a recursive decent parser for your files. That way will be easier to write, much easier to modify and almost certainly faster that a RE solution - and it can easily give you all the information in the file thus future proofing it. 'Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems.' - Jamie Zawinski. Regards Ian -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Tue, Jan 13, 2015 at 5:03 AM, Jason Bailey jbai...@emerytelcom.com wrote: Unfortunately, I get no matches. From output on the command line, I can see that Python is adding extra backslashes to my re.compile string. I have added the raw 'r' in front of the strings to prevent it, but to no avail. Regexes are notoriously hard to debug. Is there any particular reason you _have_ to use one here? ISTM you could simplify it enormously by just looking for the opening string: shared_network = My-Network-FECO network = filebody.split(\nshared-network +shared_network+ {,1)[1].split(\n}\n)[0] Assuming your file is always correctly indented, and assuming you don't have any other instances of that header string, you should be fine. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex woes (parsing ISC DHCPD config)
On 01/12/2015 01:20 PM, Jason Bailey wrote: Hi all, What changed between 1:03 and 1:20 that made you post a nearly identical second message, as a new thread? Unfortunately, I get no matches. From output on the command line, I can see that Python is adding extra backslashes to my re.compile string. I have added the raw 'r' in front of the strings to prevent it, but to no avail. What makes you think that? Please isolate this part of your problem with a simple short program, so we can diagnose it. You're probably getting confused between str() and repr(). The latter adds backslash escape sequences for good reason, and if you don't understand it, you might think the strings are getting corrupted. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Tue, Jan 13, 2015 at 6:48 AM, Ian hobso...@gmail.com wrote: My recommendation would be to write a recursive decent parser for your files. That way will be easier to write, much easier to modify and almost certainly faster that a RE solution - and it can easily give you all the information in the file thus future proofing it. Generally, even a recursive descent parser will be overkill. It's pretty easy to do simple string manipulation to get the info you want; maybe that means restricting the syntax some, but for a personal-use script, that's usually no big cost. The example I gave requires that the indentation be correct, and on this mailing list, I think people agree that that's not a deal-breaker :) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex woes (parsing ISC DHCPD config)
- Original Message - From: Jason Bailey jbai...@emerytelcom.com To: python-list@python.org Cc: Sent: Monday, January 12, 2015 7:20 PM Subject: Python 3 regex woes (parsing ISC DHCPD config) Hi all, I'm working on a Python _3_ project that will be used to parse ISC DHCPD configuration files for statistics and alarming purposes (IP address pools, etc). Anyway, I'm hung up on this one section and was hoping someone could provide me with some insight. My script first reads the DHCPD configuration file into memory - variable filebody. It then utilizes the re module to find the configuration details for the wanted shared network. The config file might look something like this: ## shared-network My-Network-MOHE { subnet 192.168.0.0 netmask 255.255.248.0 { option routers 192.168.0.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.0.20 192.168.7.254; } } } shared-network My-Network-CDCO { subnet 192.168.8.0 netmask 255.255.248.0 { option routers 10.101.8.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.8.20 192.168.15.254; } } } shared-network My-Network-FECO { subnet 192.168.16.0 netmask 255.255.248.0 { option routers 192.168.16.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.16.20 192.168.23.254; } } } ## Suppose I'm trying to grab the shared network called My-Network-FECO from the above config file stored in the variable 'filebody'. First I have my variable 'shared_network' which contains the string My-Network-FECO. I compile my regex: m = re.compile(r^(shared\-network ( + re.escape(shared_network) + r) \{((\n|.|\r\n)*?)(^\})), re.MULTILINE|re.UNICODE) I search for regex matches in my config file: m.search(filebody) Unfortunately, I get no matches. From output on the command line, I can see that Python is adding extra backslashes to my re.compile string. I have added the raw 'r' in front of the strings to prevent it, but to no avail. Thoughts on this? Will the following work for you? My brain shuts down when I try to read your regex, but I believe you also used a non-greedy match. Python 3.4.2 (default, Nov 20 2014, 13:01:11) [GCC 4.7.2] on linux Type help, copyright, credits or license for more information. cfg = shared-network My-Network-MOHE { ... subnet 192.168.0.0 netmask 255.255.248.0 { ... option routers 192.168.0.1; ... option tftp-server-name 192.168.90.12; ... pool { ... deny dynamic bootp clients; ... range 192.168.0.20 192.168.7.254; ... } ... } ... } ... ... shared-network My-Network-CDCO { ... subnet 192.168.8.0 netmask 255.255.248.0 { ... option routers 10.101.8.1; ... option tftp-server-name 192.168.90.12; ... pool { ... deny dynamic bootp clients; ... range 192.168.8.20 192.168.15.254; ... } ... } ... } ... ... shared-network My-Network-FECO { ... subnet 192.168.16.0 netmask 255.255.248.0 { ... option routers 192.168.16.1; ... option tftp-server-name 192.168.90.12; ... pool { ... deny dynamic bootp clients; ... range 192.168.16.20 192.168.23.254; ... } ... } ... } import re re.findall(rshared\-network (.+) \{?, cfg) ['My-Network-MOHE', 'My-Network-CDCO', 'My-Network-FECO'] -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
'Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems.' - Jamie Zawinski. This statement is one of my favorite examples of powerful propaganda, which has scared more folks away from regexps than even the Upright Citizens Brigade could manage with their Journey through the center of gas giant #7 and it's resulting aggravated assault on American coinage! I wonder if Jamie's conclusions are a result of careful study, or merely, an attempt to resolve his own cognitive dissonance? Of course, if the latter is true, then i give him bonus points for his use of the third person to veil his own inadequacies -- nice Jamie, *very* nice! Rick it sounds like you're accusing Jamie of cowardice resulting in sour grapes? Indeed! The problem with statements like his is that, the ironic humor near the end *fortifies* the argument so much that the reader forgets the limits of the harm (quantified as: some people) and immediately accepts the consequences as effecting all people who choose to use regexps, or more generally, accepts the argument as a universal unbounded truth. Besides, who would want to be a member of a group for which the individuals are too stupid to know good choices from bad choices? HA, PEER PRESSURE, IT'S A POWERFUL THING! But there is more going on here than just mere slight of forked tongue my friends, because, even the most accomplished propagandist cannot fool *most* of the people. No, this type of powerful propaganda only succeeds when the subject matter is both cryptic *AND* esoteric. For instance, in the following example, i contrive a similarly ironic statement to showcase the effects of such propaganda, but one that covers a subject matter in which laymen either: already understand, or, can easily attain enough knowledge to appreciate the humor. # Ironic Twist # # Some diabetics, when confronted with hunger, think I# # know, I'll eat a box of sugar cookies. -- now they have # # two problems!' # Wait a minute Rick! After eating the cookies the diabetic would not longer be hungry, so how could he have two problems? Your logic is flawed! Au Contraire! Read the statement carefully, I said: When *CONFRONTED* with hunger -- the two problems (and the eventual side effect) exist at the *MOMENT* the diabetic considers eating the cookies. PROBLEM1: Need to eat! PROBLEM2: Cookies raise glucose too quickly In this example, even a layman would understand that the statement is meant to showcase the irony of resolving a problem (hunger) with a solution (eating a box of cookies) that results in the undesirable outcome of (hyperglycemia). And while this statement, and the one about regexps, both contain a factual underlying truth (basically that negative side effects should be avoided) the layman will lack the esoteric knowledge of regexps to confirm the factual basis for himself, and will instead blindly adopt the propagandist assertion as truth, based solely on the humorous prowess of the propagandist. The most effective propaganda utilizes the sub-conscience. You see, the role of propaganda is to modify behavior, and it is a more prevalent and powerful tool than most people realize! The propagandist will seek to control you; he'll use your ignorance against you; but you didn't notice because he made you laugh! WHO'S LAUGHING NOW? -- YOU MINDLESS ROBOTS! But what's so evil about that Rick? He scared away a few feeble minded folks. SO WHAT! I argue that we are all feeble minded in any subject we have not yet mastered. His propaganda (be it intentional or not) is so powerful that it defeats the neophyte before they can even begin. Because it gives them the false impression that regexps are only used by foolish people. Yes, i'll admit, regexps are very cryptic, but once you grasp their intricacies, you appreciate the succinctness of there syntax, because, what makes them so powerful is not only the extents of their pattern matching abilities, but their conciseness. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Tue, Jan 13, 2015 at 10:47 AM, Rick Johnson rantingrickjohn...@gmail.com wrote: WHO'S LAUGHING NOW? -- YOU MINDLESS ROBOTS! It's very satisfying when mindless robots laugh. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Monday, January 12, 2015 at 7:55:32 PM UTC-6, Mark Lawrence wrote: On 12/01/2015 23:47, Rick Johnson wrote: 'Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems.' - Jamie Zawinski. [snip] If you wish to use a hydrogen bomb instead of a tooth pick feel free, I won't lose any sleep over it. Meanwhile I'll get on with writing code, and for the simple jobs that can be completed with string methods I'll carry on using them. When that gets too complicated I'll reach for the regex manual, knowing full well that there's enough data in books and online to help even a novice such as myself get over all the hurdles. If that isn't good enough then maybe a full blown parser, such as the pile listed here [snip] Mark, if you're going to quote me, then at least quote me in a manner that does not confuse the content of my post. The snippet you posted was not a statement of mine, rather, it was a quote that i was responding to, and without any context of my response, what is the point of quoting anything at all? It would be better to quote nothing and just say @Rick, then to quote something which does not have any context. Every python programmer worth his *SALT* should master the following three text processing capabilities of Python, and he should know how and when to apply them (for they all have strengths and weaknesses): (1) String methods: Simplistic API, but with limited capabilities -- but never underestimate the possibilities! (2) Regexps: Great for pattern matching with a powerful and concise syntax, but highly cryptic and unintuitive for the neophyte (and sometimes even the guru! *wink*). (3) Parsers: Great for extracting deeper meaning from text, but if pattern matching is all you need, then why not use (1) or (2) -- are you scared or uninformed? We can easily forgive a child who is afraid of the dark; the real tragedy of life is when men are afraid of the light. -- Plato IMHO, if you seek to only match patterns, then string methods should be your first choice, however, if the pattern is too difficult for string methods, then graduate to regexps. If you need to extract deeper meaning from text, by all means utilize a parser. But above all, don't fall for these religious teachings about how regexps are too difficult for mortals -- that's just hysteria. If you follow the outline i provided above, you should find Python's text processing Nirvana. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On 12/01/2015 23:47, Rick Johnson wrote: 'Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems.' - Jamie Zawinski. [normal cobblers snipped] If you wish to use a hydrogen bomb instead of a tooth pick feel free, I won't lose any sleep over it. Meanwhile I'll get on with writing code, and for the simple jobs that can be completed with string methods I'll carry on using them. When that gets too complicated I'll reach for the regex manual, knowing full well that there's enough data in books and online to help even a novice such as myself get over all the hurdles. If that isn't good enough then maybe a full blown parser, such as the pile listed here http://nedbatchelder.com/text/python-parsers.html ? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Mon, 12 Jan 2015 19:48:18 +, Ian wrote: My recommendation would be to write a recursive decent parser for your files. That way will be easier to write, I know that writing parsers is a solved problem in computer science, and that doing so is allegedly one of the more trivial things computer scientists are supposed to be able to do, but the learning curve to write parsers is if anything even higher than the learning curve to write a regex. I wish that Python made it as easy to use EBNF to write a parser as it makes to use a regex :-( http://en.wikipedia.org/wiki/Extended_Backus–Naur_Form -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On 13/01/2015 02:53, Rick Johnson wrote: On Monday, January 12, 2015 at 7:55:32 PM UTC-6, Mark Lawrence wrote: On 12/01/2015 23:47, Rick Johnson wrote: 'Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems.' - Jamie Zawinski. [snip] If you wish to use a hydrogen bomb instead of a tooth pick feel free, I won't lose any sleep over it. Meanwhile I'll get on with writing code, and for the simple jobs that can be completed with string methods I'll carry on using them. When that gets too complicated I'll reach for the regex manual, knowing full well that there's enough data in books and online to help even a novice such as myself get over all the hurdles. If that isn't good enough then maybe a full blown parser, such as the pile listed here [snip] Mark, if you're going to quote me, then at least quote me in a manner that does not confuse the content of my post. The snippet you posted was not a statement of mine, rather, it was a quote that i was responding to, and without any context of my response, what is the point of quoting anything at all? It would be better to quote nothing and just say @Rick, then to quote something which does not have any context. You snipped the bit that says [normal cobblers snipped]. Every python programmer worth his *SALT* should master the following three text processing capabilities of Python, and he should know how and when to apply them (for they all have strengths and weaknesses): (1) String methods: Simplistic API, but with limited capabilities -- but never underestimate the possibilities! (2) Regexps: Great for pattern matching with a powerful and concise syntax, but highly cryptic and unintuitive for the neophyte (and sometimes even the guru! *wink*). (3) Parsers: Great for extracting deeper meaning from text, but if pattern matching is all you need, then why not use (1) or (2) -- are you scared or uninformed? String methods, regexes, parsers, isn't that what I've already said above? Why repeat it? We can easily forgive a child who is afraid of the dark; the real tragedy of life is when men are afraid of the light. -- Plato IMHO, if you seek to only match patterns, then string methods should be your first choice, however, if the pattern is too difficult for string methods, then graduate to regexps. If you need to extract deeper meaning from text, by all means utilize a parser. I feel humbled that a great such as yourself is again repeating what I've already said. But above all, don't fall for these religious teachings about how regexps are too difficult for mortals -- that's just hysteria. If you follow the outline i provided above, you should find Python's text processing Nirvana. My favourite things in programming all go along the lines of DRY and KISS, with Although practicality beats purity being the most important of the lot. So called religious teachings never enter into my way of doing things. For example I can't stand code which jumps through hoops to avoid using GOTO, whereas nothing is cleaner than (say) GOTO ERROR. You'll (plural) find loads of them in cPython. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 regex?
On Mon, 12 Jan 2015 15:47:08 -0800, Rick Johnson wrote: 'Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems.' - Jamie Zawinski. I wonder if Jamie's conclusions are a result of careful study, or merely, an attempt to resolve his own cognitive dissonance? Zawinski is one of the pantheon of geek demi-gods, with Linus, Larry, Guido, RMS, and a few others. (Just don't ask me to rank them. I'm not qualified.) His comment isn't based on a failure to grok regular expressions, but on an understanding that many people use regular expressions inappropriately. Here is more on the context of the famous quote: http://regex.info/blog/2006-09-15/247 (By the way, the quote actually wasn't original to JZ, he stole it from an all but identical quote about awk.) [...] For instance, in the following example, i contrive a similarly ironic statement to showcase the effects of such propaganda, but one that covers a subject matter in which laymen either: already understand, or, can easily attain enough knowledge to appreciate the humor. # Ironic Twist # # Some diabetics, when confronted with hunger, think I# # know, I'll eat a box of sugar cookies. -- now they have # # two problems!' Not the best of analogies, since there are two forms of diabetes. Those with Type 2 diabetes can best manage their illness by avoiding sugar cookies. Those with Type 1 should keep a box of sugar cookies (well, perhaps glucose lollies are more appropriate) on hand for emergencies. http://www.betterhealth.vic.gov.au/bhcv2/bhcarticles.nsf/pages/Diabetes_explained?open In any case, most people with diabetes (or at least those who are still alive) are reasonably good at managing their illness and wouldn't make the choice you suggest. You have missed the point that people who misuse regexes are common in programming circles, while diabetics who eat a box of sugar cookies instead of a meal are rare. To take your analogy to an extreme: Some people, when faced with a problem, say I know, I'll cut my arm off with a pocketknife! Now they have two problems. This is not insightful or useful. Except in the most specialised and extreme circumstances, such as being trapped in the wilderness with a boulder on your arm, nobody would consider this to be good advice. But using regexes to validate email addresses or parse HTML? The internet is full of people who thought that was a good idea. [...] Yes, i'll admit, regexps are very cryptic, but once you grasp their intricacies, you appreciate the succinctness of there syntax, because, what makes them so powerful is not only the extents of their pattern matching abilities, but their conciseness. Even Larry Wall says that regexes are too concise and cryptic: http://perl6.org/archive/doc/design/apo/A05.html -- Steve -- https://mail.python.org/mailman/listinfo/python-list
Python 3 regex?
Hi all, I'm working on a Python _3_ project that will be used to parse ISC DHCPD configuration files for statistics and alarming purposes (IP address pools, etc). Anyway, I'm hung up on this one section and was hoping someone could provide me with some insight. My script first reads the DHCPD configuration file into memory - variable filebody. It then utilizes the re module to find the configuration details for the wanted shared network. The config file might look something like this: ## shared-network My-Network-MOHE { subnet 192.168.0.0 netmask 255.255.248.0 { option routers 192.168.0.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.0.20 192.168.7.254; } } } shared-network My-Network-CDCO { subnet 192.168.8.0 netmask 255.255.248.0 { option routers 10.101.8.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.8.20 192.168.15.254; } } } shared-network My-Network-FECO { subnet 192.168.16.0 netmask 255.255.248.0 { option routers 192.168.16.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.16.20 192.168.23.254; } } } ## Suppose I'm trying to grab the shared network called My-Network-FECO from the above config file stored in the variable 'filebody'. First I have my variable 'shared_network' which contains the string My-Network-FECO. I compile my regex: m = re.compile(r^(shared\-network ( + re.escape(shared_network) + r) \{((\n|.|\r\n)*?)(^\})), re.MULTILINE|re.UNICODE) I search for regex matches in my config file: m.search(filebody) Unfortunately, I get no matches. From output on the command line, I can see that Python is adding extra backslashes to my re.compile string. I have added the raw 'r' in front of the strings to prevent it, but to no avail. Thoughts on this? Thanks -- https://mail.python.org/mailman/listinfo/python-list
Python 3 regex woes (parsing ISC DHCPD config)
Hi all, I'm working on a Python _3_ project that will be used to parse ISC DHCPD configuration files for statistics and alarming purposes (IP address pools, etc). Anyway, I'm hung up on this one section and was hoping someone could provide me with some insight. My script first reads the DHCPD configuration file into memory - variable filebody. It then utilizes the re module to find the configuration details for the wanted shared network. The config file might look something like this: ## shared-network My-Network-MOHE { subnet 192.168.0.0 netmask 255.255.248.0 { option routers 192.168.0.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.0.20 192.168.7.254; } } } shared-network My-Network-CDCO { subnet 192.168.8.0 netmask 255.255.248.0 { option routers 10.101.8.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.8.20 192.168.15.254; } } } shared-network My-Network-FECO { subnet 192.168.16.0 netmask 255.255.248.0 { option routers 192.168.16.1; option tftp-server-name 192.168.90.12; pool { deny dynamic bootp clients; range 192.168.16.20 192.168.23.254; } } } ## Suppose I'm trying to grab the shared network called My-Network-FECO from the above config file stored in the variable 'filebody'. First I have my variable 'shared_network' which contains the string My-Network-FECO. I compile my regex: m = re.compile(r^(shared\-network ( + re.escape(shared_network) + r) \{((\n|.|\r\n)*?)(^\})), re.MULTILINE|re.UNICODE) I search for regex matches in my config file: m.search(filebody) Unfortunately, I get no matches. From output on the command line, I can see that Python is adding extra backslashes to my re.compile string. I have added the raw 'r' in front of the strings to prevent it, but to no avail. Thoughts on this? Thanks -- https://mail.python.org/mailman/listinfo/python-list
[issue23191] fnmatch regex cache use is not threadsafe
New submission from M. Schmitzer: The way the fnmatch module uses its regex cache is not threadsafe. When multiple threads use the module in parallel, a race condition between retrieving a - presumed present - item from the cache and clearing the cache (because the maximum size has been reached) can lead to KeyError being raised. The attached script demonstrates the problem. Running it will (eventually) yield errors like the following. Exception in thread Thread-10: Traceback (most recent call last): File /usr/lib/python2.7/threading.py, line 810, in __bootstrap_inner self.run() File /usr/lib/python2.7/threading.py, line 763, in run self.__target(*self.__args, **self.__kwargs) File fnmatch_thread.py, line 12, in run fnmatch.fnmatchcase(name, pat) File /home/marc/.venv/modern/lib/python2.7/fnmatch.py, line 79, in fnmatchcase return _cache[pat].match(name) is not None KeyError: 'lYwrOCJtLU' -- components: Library (Lib) files: fnmatch_thread.py messages: 233650 nosy: mschmitzer priority: normal severity: normal status: open title: fnmatch regex cache use is not threadsafe type: crash versions: Python 2.7 Added file: http://bugs.python.org/file37642/fnmatch_thread.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23191 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23191] fnmatch regex cache use is not threadsafe
STINNER Victor added the comment: I guess that a lot of stdlib modules are not thread safe :-/ A workaround is to protect calls to fnmatch with your own lock. -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23191 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23191] fnmatch regex cache use is not threadsafe
M. Schmitzer added the comment: Ok, if that is the attitude in such cases, feel free to close this. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23191 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23191] fnmatch regex cache use is not threadsafe
STINNER Victor added the comment: It would be nice to fix the issue, but I don't know how it is handled in other stdlib modules. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23191 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23191] fnmatch regex cache use is not threadsafe
Serhiy Storchaka added the comment: It is easy to make fnmatch caching thread safe without locks. Here is a patch. The problem with fnmatch is that the caching is implicit and a user don't know that any lock are needed. So either the need of the lock should be explicitly documented, or fnmatch should be made thread safe. The second option looks more preferable to me. In 3.x fnmatch is thread safe because thread safe lru_cache is used. -- keywords: +patch nosy: +serhiy.storchaka stage: - patch review Added file: http://bugs.python.org/file37643/fnmatch_threadsafe.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23191 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23191] fnmatch regex cache use is not threadsafe
M. Schmitzer added the comment: @serhiy.storchaka: My thoughts exactly, especially regarding the caching being implicit. From the outside, fnmatch really doesn't look like it could have threading issues. The patch also looks exactly like what I had in mind. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23191 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
regex tool in the python source tree
I remember seeing here (couple of weeks ago??) a mention of a regex debugging/editing tool hidden away in the python source tree. Does someone remember the name/path? There are of course dozens of online ones... Looking for a python native tool -- https://mail.python.org/mailman/listinfo/python-list
Re: regex tool in the python source tree
On Saturday, December 20, 2014 12:01:10 PM UTC+5:30, Rustom Mody wrote: I remember seeing here (couple of weeks ago??) a mention of a regex debugging/editing tool hidden away in the python source tree. Does someone remember the name/path? There are of course dozens of online ones... Looking for a python native tool Ok I found redemo here https://docs.python.org/3/howto/regex.html Should also mention that that link mentions kodos as though it works. The kodos site http://sourceforge.net/projects/kodos/files/ shows old as 2002 new as 2006. Last I tried it did not work for python2.7 even leave aside 3.x. If theres anything more uptodate, I'd like to know -- https://mail.python.org/mailman/listinfo/python-list
[issue2636] Adding a new regex module (compatible with re)
Mateon1 added the comment: Well, I am reporting it here, is this not the correct place? Sorry if it is. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Changes by Brett Cannon br...@python.org: -- nosy: +brett.cannon ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Matthew Barnett added the comment: The page on PyPI says where the project's homepage is located: Home Page: https://code.google.com/p/mrab-regex-hg/ The bug was fixed in the last release. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Mateon1 added the comment: Well, I found a bug with this module, on Python 2.7(.5), on Windows 7 64-bit when you try to compile a regex with the flags V1|DEBUG, the module crashes as if it wanted to call a builtin called ascii. The bug happened to me several times, but this is the regexp when the last one happened. http://paste.ubuntu.com/8993680/ I hope it's fixed, I really love the module and found it very useful to have PCRE regexes in Python. -- nosy: +Mateon1 versions: -Python 3.5 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Matthew Barnett added the comment: @Mateon1: I hope it's fixed? Did you report it? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22364] Improve some re error messages using regex for hints
Terry J. Reedy added the comment: I already said we should either stick with what we have if better (and gave examples, including sticking with 'cannot') or possibly combine the best of both if we can improve on both. 13 should use 'bytes-like' (already changed?). There is no review button. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22364] Improve some re error messages using regex for hints
Serhiy Storchaka added the comment: Here is a patch which makes re error messages match regex. It doesn't look to me that all these changes are enhancements. -- keywords: +patch Added file: http://bugs.python.org/file37167/re_errors_regex.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Nick Coghlan added the comment: Thanks for pushing this one forward Serhiy! Your approach sounds like a fine plan to me. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Jeffrey C. Jacobs added the comment: If I recall, I started this thread with a plan to update re itself with implementations of various features listed in the top post. If you look at the list of files uploaded by me there are seme complete patches for Re to add various features like Atomic Grouping. If we wish to therefore bring re to regex standard we could start with those features. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Serhiy Storchaka added the comment: Here is my (slowly implemented) plan: 0. Recommend regex as advanced replacement of re (issue22594). 1. Fix all obvious bugs in the re module if this doesn't break backward compatibility (issue12728, issue14260, and many already closed issues). 2. Deprecate and then forbid behavior which looks as a bug, doesn't match regex in V1 mode and can't be fixed without breaking backward compatibility (issue22407, issue22493, issue22818). 3. Unify minor details with regex (issue22364, issue22578). 4. Fork regex and drop all advanced nonstandard features (such as fuzzy matching). Too many features make learning and using the module more hard. They should be in advanced module (regex). 5. Write benchmarks which cover all corner cases and compare re with regex case by case. Optimize slower module. Currently re is faster regex for all simple examples which I tried (may be as result of issue18685), but in total results of benchmarks (msg109447) re is slower. 6. May be implement some standard features which were rejected in favor of this issue (issue433028, issue433030). re should conform at least Level 1 of UTS #18 (http://www.unicode.org/reports/tr18/#Basic_Unicode_Support). In best case in 3.7 or 3.8 we could replace re with simplified regex. Or at this time re will be free from bugs and warts. -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Antoine Pitrou added the comment: Here is my (slowly implemented) plan: Exciting. Perhaps you should post your plan on python-dev. In any case, huge thanks for your work on the re module. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Serhiy Storchaka added the comment: Exciting. Perhaps you should post your plan on python-dev. Thank you Antoine. I think all interested core developers are already aware about this issue. A disadvantage of posting on python-dev is that this would require manually copy links and may be titles of all mentioned issues, while here they are available automatically. Oh, I'm lazy. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Ezio Melotti added the comment: So you are suggesting to fix bugs in re to make it closer to regex, and then replace re with a forked subset of regex that doesn't include advanced features, or just to fix/improve re until it matches the behavior of regex? If you are suggesting the former, I would also suggest checking the coverage and bringing it as close as possible to 100%. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Serhiy Storchaka added the comment: So you are suggesting to fix bugs in re to make it closer to regex, and then replace re with a forked subset of regex that doesn't include advanced features, or just to fix/improve re until it matches the behavior of regex? Depends on what will be easier. May be some bugs are so hard to fix that replacing re with regex is only solution. But if fixed re will be simpler and faster than lightened regex and will contain all necessary features, there will be no need in the replacing. Currently the code of regex looks more high level and better structured, but the code of re looks simpler and is much smaller. In any case the closer will be re and regex the easier will be the migration. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Ezio Melotti added the comment: Ok, regardless of what will happen, increasing test coverage is a worthy goal. We might start by looking at the regex test suite to see if we can import some tests from there. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Regex substitution trouble
massi_...@msn.com wrote: Hi everyone, I'm not really sure if this is the right place to ask about regular expressions, but since I'm usin python I thought I could give a try :-) Here is the problem, I'm trying to write a regex in order to substitute all the occurences in the form $somechars with another string. This is what I wrote: newstring = re.sub(ur(?u)(\$\[\s\w]+\), subst, oldstring) This works pretty well, but it has a problem, I would need it also to handle the case in which the internal string contains the double quotes, but only if preceeded by a backslash, that is something like $somechars_with\\doublequotes. Can anyone help me to correct it? Thanks in advance! Hi! Next snippet works for me: re.sub(r'\$([\s\w]+(\\)*[\s\w]+)+', 'noop', r'$te\sts\tri\ng') -- https://mail.python.org/mailman/listinfo/python-list
[issue22364] Improve some re error messages using regex for hints
Raymond Hettinger added the comment: +1 -- nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22364] Improve some re error messages using regex for hints
Changes by Serhiy Storchaka storch...@gmail.com: -- dependencies: +Add additional attributes to re.error, Other mentions of the buffer protocol ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22364] Improve some re error messages using regex for hints
Ezio Melotti added the comment: +1 on the idea. -- stage: - needs patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22364 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Regex substitution trouble
Hi everyone, I'm not really sure if this is the right place to ask about regular expressions, but since I'm usin python I thought I could give a try :-) Here is the problem, I'm trying to write a regex in order to substitute all the occurences in the form $somechars with another string. This is what I wrote: newstring = re.sub(ur(?u)(\$\[\s\w]+\), subst, oldstring) This works pretty well, but it has a problem, I would need it also to handle the case in which the internal string contains the double quotes, but only if preceeded by a backslash, that is something like $somechars_with\\doublequotes. Can anyone help me to correct it? Thanks in advance! -- https://mail.python.org/mailman/listinfo/python-list
Re: Regex substitution trouble
On Tue, Oct 28, 2014 at 10:02 PM, massi_...@msn.com wrote: I'm not really sure if this is the right place to ask about regular expressions, but since I'm usin python I thought I could give a try :-) Yeah, that sort of thing is perfectly welcome here. Same with questions about networking in Python, or file I/O in Python, or anything like that. Not a problem! Here is the problem, I'm trying to write a regex in order to substitute all the occurences in the form $somechars with another string. This is what I wrote: newstring = re.sub(ur(?u)(\$\[\s\w]+\), subst, oldstring) This works pretty well, but it has a problem, I would need it also to handle the case in which the internal string contains the double quotes, but only if preceeded by a backslash, that is something like $somechars_with\\doublequotes. Can anyone help me to correct it? But this is a problem. You can use look-ahead assertions and such to allow the string \ inside your search string, but presumably the backslash ought itself to be escapable, in order to make it possible to have a loose backslash legal at the end of the string. I suggest that, instead of a regex, you look for a different way of parsing. What's the surrounding text like? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Regex substitution trouble
Hi Chris, thanks for the reply. I tried to use look ahead assertions, in particular I modified the regex this way: newstring = re.sub(ur(?u)(\$\[\s\w(?=\\)\]+\), subst, oldstring) but it does not work. I'm absolutely not a regex guru so I'm surely missing something. The strings I'm dealing with are similar to formulas, let's say something like: '$[simple_input]+$[messed_\\_input]+10' Thanks for any help! -- https://mail.python.org/mailman/listinfo/python-list
Re: Regex substitution trouble
(Please quote enough of the previous text to provide context, and write your replies underneath the quoted text - don't assume that everyone's read the previous posts. Thanks!) On Tue, Oct 28, 2014 at 11:28 PM, massi_...@msn.com wrote: Hi Chris, thanks for the reply. I tried to use look ahead assertions, in particular I modified the regex this way: newstring = re.sub(ur(?u)(\$\[\s\w(?=\\)\]+\), subst, oldstring) but it does not work. I'm absolutely not a regex guru so I'm surely missing something. Yeah, I'm not a high-flying regex programmer either, so I'll leave the specifics for someone else to answer. Tip, though: Print out your regex, to see if it's really what you think it is. When you get backslashes and quotes coming through, sometimes you can get tangled, even in a raw string literal; sometimes, one quick print(some_re) can save hours of hair-pulling. The strings I'm dealing with are similar to formulas, let's say something like: '$[simple_input]+$[messed_\\_input]+10' Thanks for any help! Hmm. This looks like a job for ast.literal_eval with an actual dictionary. All you'd have to do is replace every instance of $ with a dict literal; it mightn't be efficient, but it would be safe. Using Python 2.7.8 as you appear to be on 2.x: expr = '$[simple_input]+$[messed_\\_input]+10' values = {simple_input:123, messed_\_input:75} ast.literal_eval(expr.replace($,repr(values))) Traceback (most recent call last): File pyshell#4, line 1, in module ast.literal_eval(expr.replace($,repr(values))) File C:\Python27\lib\ast.py, line 80, in literal_eval return _convert(node_or_string) File C:\Python27\lib\ast.py, line 79, in _convert raise ValueError('malformed string') ValueError: malformed string Unfortunately, it doesn't appear to work, as evidenced by the above message. It works with the full (and dangerous) eval, though: eval(expr.replace($,repr(values))) 208 Can someone who better knows ast.literal_eval() explain what's malformed about this? The error message in 3.4 is a little more informative, but not much more helpful: ValueError: malformed node or string: _ast.BinOp object at 0x0169BAF0 My best theory is that subscripting isn't allowed, though this seems odd. In any case, it ought in theory to be possible to use Python's own operations on this. You might have to do some manipulation, but it'd mean you can leverage a full expression evaluator that already exists. I'd eyeball the source code for ast.literal_eval() and see about making an extended version that allows the operations you want. If you can use something other than a dollar sign - something that's syntactically an identifier - you'll be able to skip the textual replace() operation, which is risky (might change the wrong thing). Do that, and you could have your own little evaluator that uses the ast module for most of its work, and simply runs a little recursive walker that deals with the nodes as it finds them. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Regex substitution trouble
On Tue, Oct 28, 2014 at 4:02 AM, massi_...@msn.com wrote: Hi everyone, I'm not really sure if this is the right place to ask about regular expressions, but since I'm usin python I thought I could give a try :-) Here is the problem, I'm trying to write a regex in order to substitute all the occurences in the form $somechars with another string. This is what I wrote: newstring = re.sub(ur(?u)(\$\[\s\w]+\), subst, oldstring) This works pretty well, but it has a problem, I would need it also to handle the case in which the internal string contains the double quotes, but only if preceeded by a backslash, that is something like $somechars_with\\doublequotes. Can anyone help me to correct it? Thanks in advance! -- https://mail.python.org/mailman/listinfo/python-list Carefully reading the Strings section of Example Regexes to Match Common Programming Language Constructs [1] should (with a bit of effort), solve your problem I think. Note the use of the negated character class for one thing. [1] http://www.regular-expressions.info/examplesprogrammer.html -- https://mail.python.org/mailman/listinfo/python-list
Re: Regex substitution trouble
On Tuesday, October 28, 2014 7:03:00 AM UTC-4, mass...@msn.com wrote: Hi everyone, I'm not really sure if this is the right place to ask about regular expressions, but since I'm usin python I thought I could give a try :-) Here is the problem, I'm trying to write a regex in order to substitute all the occurences in the form $somechars with another string. This is what I wrote: newstring = re.sub(ur(?u)(\$\[\s\w]+\), subst, oldstring) This works pretty well, but it has a problem, I would need it also to handle the case in which the internal string contains the double quotes, but only if preceeded by a backslash, that is something like $somechars_with\\doublequotes. Can anyone help me to correct it? Thanks in advance! You have some good answers already, but I wanted to let you know about a tool you may already have which is useful for experimenting with regexps. On windows, the file `redemo.py` is in the Tools/Scripts folder. If you're on a Mac, see http://stackoverflow.com/questions/1811236/how-can-i-run-redemo-py-or-equivalent-on-a-mac It has really helped me work on some tough regexps. good luck, --Tim -- https://mail.python.org/mailman/listinfo/python-list
Re: Regex substitution trouble
On 2014-10-28 12:28, massi_...@msn.com wrote: Hi Chris, thanks for the reply. I tried to use look ahead assertions, in particular I modified the regex this way: newstring = re.sub(ur(?u)(\$\[\s\w(?=\\)\]+\), subst, oldstring) but it does not work. I'm absolutely not a regex guru so I'm surely missing something. The strings I'm dealing with are similar to formulas, let's say something like: '$[simple_input]+$[messed_\\_input]+10' Thanks for any help! Your original post said you wanted to match strings like: $somechars_with\\doublequotes. This regex will do that: ur'\$[^\\]*(?:\\.[^\\]*)*' However, now you say you want to match: '$[simple_input]' This is different; it has '[' immediately after the '$' instead of ''. -- https://mail.python.org/mailman/listinfo/python-list
Re: Regex substitution trouble
On 28Oct2014 04:02, massi_...@msn.com massi_...@msn.com wrote: I'm not really sure if this is the right place to ask about regular expressions, but since I'm usin python I thought I could give a try :-) Here is the problem, I'm trying to write a regex in order to substitute all the occurences in the form $somechars with another string. This is what I wrote: newstring = re.sub(ur(?u)(\$\[\s\w]+\), subst, oldstring) This works pretty well, but it has a problem, I would need it also to handle the case in which the internal string contains the double quotes, but only if preceeded by a backslash, that is something like $somechars_with\\doublequotes. Can anyone help me to correct it? People seem to be making this harder than it should be. I'd just be fixing up your definition of what's inside the quotes. There seem to be 3 kinds of things: - not a double quote or backslash - a backslash followed by a double quote - a backslash followed by not a double quote Kind 3 is a policy call - take the following character or not? I would go with treating it like kind 2 myself. So you have: 1 [^\\] 2 \\ 3 \\[^] and fold 2 and 3 into: 2+3 \\. So your regexp inner becomes: ([^\\]|\\.)* and the whole thing becomes: \$(([^\\]|\\.)*) and as a raw string: ur'\$(([^\\]|\\.)*)' choosing single quotes to be more readable given the double quotes in the regexp. Cheers, Cameron Simpson c...@zip.com.au -- cat: /Users/cameron/rc/mail/signature.: No such file or directory Language... has created the word loneliness to express the pain of being alone. And it has created the word solitude to express the glory of being alone. - Paul Johannes Tillich -- https://mail.python.org/mailman/listinfo/python-list
[issue22594] Add a link to the regex module in re documentation
anupama srinivas murthy added the comment: I have modified the patch and listed the points I know. Could you review it? -- versions: -Python 3.4, Python 3.5 Added file: http://bugs.python.org/file37052/regex-link.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22594 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22594] Add a link to the regex module in re documentation
Changes by Serhiy Storchaka storch...@gmail.com: -- components: +Regular Expressions ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22594 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10076] Regex objects became uncopyable in 2.5
Changes by Serhiy Storchaka storch...@gmail.com: -- components: +Regular Expressions versions: +Python 3.4, Python 3.5 -Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10076 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22594] Add a link to the regex module in re documentation
anupama srinivas murthy added the comment: I have added the link and attached the patch below. Could you review it? Thank you -- components: -Regular Expressions keywords: +patch nosy: +anupama.srinivas.murthy Added file: http://bugs.python.org/file36900/regex-link.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22594 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22594] Add a link to the regex module in re documentation
Georg Brandl added the comment: currently more bugfree and intended to replace re The first part is spreading FUD if not explained in more detail. The second is probably never going to happend :( -- nosy: +georg.brandl ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22594 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22594] Add a link to the regex module in re documentation
Ezio Melotti added the comment: +1 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22594 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22594] Add a link to the regex module in re documentation
Changes by Berker Peksag berker.pek...@gmail.com: -- stage: - needs patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22594 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22594] Add a link to the regex module in re documentation
Changes by Tshepang Lekhonkhobe tshep...@gmail.com: -- nosy: +tshepang ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22594 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22594] Add a link to the regex module in re documentation
New submission from Serhiy Storchaka: The regex module is purposed as a replacement of standard re module. Of course we fix re bugs, but for now regex is more bugfree. Even after fixing all open re bugs, regex will remain more featured. It would be good to add a link to regex in re documentation (as there are links to other GUI libraries in Tkinter documentation). -- assignee: docs@python components: Documentation, Regular Expressions keywords: easy messages: 228961 nosy: docs@python, ezio.melotti, mrabarnett, pitrou, serhiy.storchaka priority: normal severity: normal status: open title: Add a link to the regex module in re documentation type: enhancement versions: Python 2.7, Python 3.4, Python 3.5 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22594 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: help with regex
James Smith wrote: I want the last 1 I can't this to work: pattern=re.compile( (\d+)$ ) match=pattern.match( LINE: 235 : Primary Shelf Number (attempt 1): 1) print match.group() pattern = re.compile((\d+)$) match = pattern.search( LINE: 235 : Primary Shelf Number (attempt 1): 1) match.group() '1' See https://docs.python.org/dev/library/re.html#search-vs-match -- https://mail.python.org/mailman/listinfo/python-list
Re: help with regex
Peter Otten __pete...@web.de writes: pattern = re.compile((\d+)$) match = pattern.search( LINE: 235 : Primary Shelf Number (attempt 1): 1) match.group() '1' An alternative way to accomplish the above using the ‘match’ method:: import re pattern = re.compile(^.*:(? *)(\d+)$) match = pattern.match(LINE: 235 : Primary Shelf Number (attempt 1): 1) match.groups() ('1',) See https://docs.python.org/dev/library/re.html#search-vs-match Right. Always refer to the API documentation for the API you're attempting to use. -- \“Without cultural sanction, most or all of our religious | `\ beliefs and rituals would fall into the domain of mental | _o__) disturbance.” —John F. Schumaker | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list