Re: [Tutor] Re Module
Asad, After reading replies to you by Alan and Steven I want to ask you if you can first tell us in normal words what the exact outline of the program does. If you only want help on one small part, tell us about that. I was first fooled into thinking you wanted to show us how you solve the majority of the entire problem, whatever it was so I wanted to hear things like I show next. An example would be to search two files for error matches of various kinds and report if they contain any matches. Just report True versus False or something. Another goal might be to show the first match in some way then quit. Another might be to do the same search in two files and report ALL the matches in some format. After being clear on the goal, you might specify the overall algorithm you want to use. For example, do you process one file to completion and save some results then process the other the same way then compare and produce output? Or do you process both nearly simultaneously in one pass, or perhaps multiple passes. Do you search for one error type at a time or all at once? Can there be multiple errors on the same line of the same kind or different ones? What does error even mean? Is it something like "Fail: 666" versus "Warn: 42" or something where multiple errors share a part or ... Once we have some idea of the goal, we could help you see if the approach seems reasonable even before reading the code. And, when reading the code, we might see if your implementation seems to match the plan so perhaps we can see where you diverge from it perhaps with a mistake. If I just look at what you provided, you do some of what I asked. You are not clear on what the two files contain other than they may have an error that you can identify with a set of patterns. Can you tell us if you are looking at one line at a time, assuming it is a text file? Your code shows no evidence of a file at all. Your focus in what you share with us is mainly on creating a list of compiled search patterns and applying it to one uninitialized "st" and trying to figure out which one matched. You do not show any examples of the pattern but suggest something is failing. For all we know one of your patterns just matched the presence of a single common character or even was not formatted properly and failed to be compiled. My impression is you are not actually asking about the overall problem. Your real question may be how to use a regular expression on a string and find out what matched. If so, that would be the headline, not about two files. And it may even be your entire approach could change. An example would be to store your patterns as a text keyword in a dictionary with the value being the compiled version so when you evaluate a line using the pattern, you know which one you matched with. I am NOT saying this is a good solution or a better one. I am asking you to think what you will need and what techniques might make life easier in doing it. So besides trying to alter some code based of the feedback, from others, could you resubmit the question with a focus on what you are doing and what exactly is not working that you want looked at. Specifics would be useful including at least one pattern and a line of sample text that should be matched by the pattern as an example and perhaps one that should not. And any error messages are vital. When you do, I am sure Steven and Alan and others might be able to zoom right in and help you diagnose, if you don't figure it out by yourself first by being able to see what your goal is and perhaps doing a little debugging. -Original Message- From: Tutor On Behalf Of Asad Sent: Thursday, December 27, 2018 10:10 AM To: tutor@python.org Subject: [Tutor] Re Module Hi All , I trying find a solution for my script , I have two files : file1 - I need a search a error say x if the error matches Look for the same error x in other file 2 Here is the code : I have 10 different patterns therefore I used list comprehension and compiling the pattern so I loop over and find the exact pattern matching re_comp1 = [re.compile(pattern) for pattern in str1] for pat in re_comp1: if pat.search(st,re.IGNORECASE): x = pat.pattern print x===> here it gives the expected output it correct match print type(x) if re.search('x', line, re.IGNORECASE) is not None: ===> Gives a wrong match print line Instead if I use : if re.search(x, line, re.IGNORECASE) is not None: then no match occurs print line Please advice where I going wrong or what can be done to make it better . Thanks, -- Asad Hasan +91 9582111698 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http
Re: [Tutor] Re Module
On Thu, Dec 27, 2018 at 08:40:12PM +0530, Asad wrote: > Hi All , > > I trying find a solution for my script , I have two files : > > file1 - I need a search a error say x if the error matches > > Look for the same error x in other file 2 > > Here is the code : > I have 10 different patterns therefore I used list comprehension and > compiling the pattern so I loop over and find the exact pattern matching > > re_comp1 = [re.compile(pattern) for pattern in str1] You can move the IGNORECASE flag into the call to compile. Also, perhaps you can use better names instead of "str1" (one string?). patterns = [re.compile(pattern, re.IGNORECASE) for pattern in string_patterns] > for pat in re_comp1: > if pat.search(st,re.IGNORECASE): > x = pat.pattern > print x===> here it gives the expected output it correct > match > print type(x) > Be careful here: even though you have ten different patterns, only *one* will be stored in x. If three patterns match, x will only get the last of the three and the others will be ignored. > if re.search('x', line, re.IGNORECASE) is not None: ===> Gives a wrong match That's because you are trying to match the literal string "x", so it will match anything with the letter "x": box, text, ax, equinox, except, hexadecimal, fix, Kleenex, sixteen ... > Instead if I use : > > if re.search(x, line, re.IGNORECASE) is not None: then no match occurs > print line Here you are trying to match the variable called x. That is a very bad name for a variable (what does "x" mean?) but it should work. If no match occurs, it probably means that the value of x doesn't occur in the line you are looking at. Try printing x and line and see if they are what you expect them to be: print x print line -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Re Module
On 27/12/2018 15:10, Asad wrote: > file1 - I need a search a error say x if the error matches > > Look for the same error x in other file 2 > > Here is the code : > I have 10 different patterns therefore I used list comprehension and > compiling the pattern so I loop over and find the exact pattern matching > > re_comp1 = [re.compile(pattern) for pattern in str1] I assume str1 is actually a list of strings? You don't show the definition but since you say it gives the expected output I'll hope that its correct. > for pat in re_comp1: > if pat.search(st,re.IGNORECASE): > x = pat.pattern > print x===> here it gives the expected output it correct I assume st comes from your file1? You don't show us that bit of code either... But you do realize that the print only shows the last result. If there is more than one matching pattern the previous results get thrown away. And if you only care about one match you could just use a single regex. On the other hand, if you do only want the last matching pattern then what you have works. > if re.search('x', line, re.IGNORECASE) is not None: ===> Gives a wrong > match > print line Notice that you pass the string 'x' into the search. I assume it is meant to be x? That means you are searching for the single character 'x' in line. You also don't show us where line comes from I assume its the other file? But why do you switch from using the compiled pattern? Why not just assign x to the pattern object pat? This can then be used to search line directly and with greater efficiency. > if re.search(x, line, re.IGNORECASE) is not None: then no match occurs > print line And are you sure a match should occur? It would help debug this if you showed us some sample data. Such as the value of x and the value of line. Given you are obviously only showing us a selected segment of your code its hard to be sure. But as written here you are searching line even if no pattern matches in file1. That is, you could loop through all your patterns, never assign anything to x and then go ahead and try to search for 'x' in line. You should probably check x first. Also, since you don't show the file looping code we don't know whether you break out whenever you find a match or whether the rest of the code is all inside the first loop over file1. Trying to debug someone else's code is hard enough. When we only have half the code we are reduced to guesswork. Finally, do you get any error messages? If so, please post them in their entirety. Based on your code I'm assuming you are working on Python v2.? but its always worth posting the python version and OS. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Re Module
Hi All , I trying find a solution for my script , I have two files : file1 - I need a search a error say x if the error matches Look for the same error x in other file 2 Here is the code : I have 10 different patterns therefore I used list comprehension and compiling the pattern so I loop over and find the exact pattern matching re_comp1 = [re.compile(pattern) for pattern in str1] for pat in re_comp1: if pat.search(st,re.IGNORECASE): x = pat.pattern print x===> here it gives the expected output it correct match print type(x) if re.search('x', line, re.IGNORECASE) is not None: ===> Gives a wrong match print line Instead if I use : if re.search(x, line, re.IGNORECASE) is not None: then no match occurs print line Please advice where I going wrong or what can be done to make it better . Thanks, -- Asad Hasan +91 9582111698 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module
Hey thanks Danny Yoo, Chris “Kwpolska” Warrick, D.V.N Sarma . I will take all your inputs. Thanks a lot. On Fri, Aug 15, 2014 at 3:32 AM, Danny Yoo d...@hashcollision.org wrote: On Thu, Aug 14, 2014 at 8:39 AM, D.V.N.Sarma డి.వి.ఎన్.శర్మ dvnsa...@gmail.com wrote: I tested it on IDLE. It works. Hi Sarma, Following up on this one. I'm pretty sure that: print re.search(span style=\(.*)\, stmt).group() is going to print something, but it almost certainly will not do what Sunil wants. See: https://docs.python.org/2/howto/regex.html#greedy-versus-non-greedy for why. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] re module
Hi, I have string like stmt = 'pspan style=font-size: 11pt;span style=font-family: times new roman,times;Patient name:nbsp;Upadhyay Shyam/spanspan style=font-family: times new roman,times;nbsp;nbsp;br /Date of birth:nbsp;nbsp;nbsp;08/08/1988 br /Issue(s) to be analyzed:nbsp;nbsp;tes/span/spanbr /span style=font-size: 11pt;span style=font-family: times new roman,times;Nurse Clinical summary:nbsp;nbsp;test1/spanspan style=font-family: times new roman,times;nbsp;br /br /Date of injury:nbsp;nbsp;nbsp;12/14/2013/spanbr /span style=font-family: times new roman,times;Diagnoses:nbsp;nbsp;nbsp;723.4 - 300.02 - 298.3 - 780.50 - 724.4nbsp;Brachial neuritis or radiculitis nos - Generalized anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance - Thoracic or lumbosacral neuritis or radiculitis, unspecified/spanbr /span style=font-family: times new roman,times;Requester name:nbsp;nbsp;nbsp;Demo Spltycdtestt/spanbr /span style=font-family: times new roman,times;Phone #:nbsp;nbsp;nbsp;(213) 480-9000/spanbr /br /span style=font-family: times new roman,times;Medical records reviewed br /__ pages of medical and administrative records were reviewed including:br /br /br /Criteria used in analysis br /nbsp;br /br /Reviewer comments br /br /br /Determinationbr /Based on the clinical information submitted for this review and using the evidence-based, peer-reviewed guidelines referenced above, this request isnbsp;br /br /Peer Reviewer Name/Credentialsnbsp;nbsp;/spanbr /span style=font-family: times new roman,times;Solis, Test,nbsp;PhD/spanbr /span style=font-family: times new roman,times;Internal Medicine/spanbr /span style=font-family: times new roman,times;nbsp;/spanbr /br /span style=font-family: times new roman,times;Attestationbr /br /br /Contact Information/spanspan style=font-family: times new roman,times;nbsp;br //span/span/pbr/font face=\'times new roman,times\' size=\'3\'Peer to Peer contact attempt 1: 08/13/2014 02:46 PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did Not Change Determination/font' i am trying to find the various font sizes and font face from this string. i tried print re.search(span style=\(.*)\, stmt).group() Thank you. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module
On 14 Aug 2014 15:58 Sunil Tech sunil.tech...@gmail.com wrote: Hi, I have string like stmt = 'pspan style=font-size: 11pt;span style=font-family: times new roman,times;Patient name:nbsp;Upadhyay Shyam/spanspan style=font-family: times new roman,times;nbsp;nbsp;br /Date of birth:nbsp;nbsp;nbsp;08/08/1988 br /Issue(s) to be analyzed:nbsp;nbsp;tes/span/spanbr /span style=font-size: 11pt;span style=font-family: times new roman,times;Nurse Clinical summary:nbsp;nbsp;test1/spanspan style=font-family: times new roman,times;nbsp;br /br /Date of injury:nbsp;nbsp;nbsp;12/14/2013/spanbr /span style=font-family: times new roman,times;Diagnoses:nbsp;nbsp;nbsp;723.4 - 300.02 - 298.3 - 780.50 - 724.4nbsp;Brachial neuritis or radiculitis nos - Generalized anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance - Thoracic or lumbosacral neuritis or radiculitis, unspecified/spanbr /span style=font-family: times new roman,times;Requester name:nbsp;nbsp;nbsp;Demo Spltycdtestt/spanbr /span style=font-family: times new roman,times;Phone #:nbsp;nbsp;nbsp;(213) 480-9000/spanbr /br /span style=font-family: times new roman,times;Medical records reviewed br /__ pages of medical and administrative records were reviewed including:br /br /br /Criteria used in analysis br /nbsp;br /br /Reviewer comments br /br /br /Determinationbr /Based on the clinical information submitted for this review and using the evidence-based, peer-reviewed guidelines referenced above, this request isnbsp;br /br /Peer Reviewer Name/Credentialsnbsp;nbsp;/spanbr /span style=font-family: times new roman,times;Solis, Test,nbsp;PhD/spanbr /span style=font-family: times new roman,times;Internal Medicine/spanbr /span style=font-family: times new roman,times;nbsp;/spanbr /br /span style=font-family: times new roman,times;Attestationbr /br /br /Contact Information/spanspan style=font-family: times new roman,times;nbsp;br //span/span/pbr/font face=\'times new roman,times\' size=\'3\'Peer to Peer contact attempt 1: 08/13/2014 02:46 PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did Not Change Determination/font' i am trying to find the various font sizes and font face from this string. i tried print re.search(span style=\(.*)\, stmt).group() Thank you. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor Don't use regular expressions for HTML. Use lxml instead. Also, why would you need that exact thing? It's useless. Also, this code is very ugly, with too many spans and — worse — fonts which should not be used at all. -- Chris “Kwpolska” Warrick http://chriswarrick.com/ Sent from my SGS3. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module
I tested it on IDLE. It works. regards, Sarma. On Thu, Aug 14, 2014 at 7:37 PM, Chris “Kwpolska” Warrick kwpol...@gmail.com wrote: On 14 Aug 2014 15:58 Sunil Tech sunil.tech...@gmail.com wrote: Hi, I have string like stmt = 'pspan style=font-size: 11pt;span style=font-family: times new roman,times;Patient name:nbsp;Upadhyay Shyam/spanspan style=font-family: times new roman,times;nbsp;nbsp;br /Date of birth:nbsp;nbsp;nbsp;08/08/1988 br /Issue(s) to be analyzed:nbsp;nbsp;tes/span/spanbr /span style=font-size: 11pt;span style=font-family: times new roman,times;Nurse Clinical summary:nbsp;nbsp;test1/spanspan style=font-family: times new roman,times;nbsp;br /br /Date of injury:nbsp;nbsp;nbsp;12/14/2013/spanbr /span style=font-family: times new roman,times;Diagnoses:nbsp;nbsp;nbsp;723.4 - 300.02 - 298.3 - 780.50 - 724.4nbsp;Brachial neuritis or radiculitis nos - Generalized anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance - Thoracic or lumbosacral neuritis or radiculitis, unspecified/spanbr /span style=font-family: times new roman,times;Requester name:nbsp;nbsp;nbsp;Demo Spltycdtestt/spanbr /span style=font-family: times new roman,times;Phone #:nbsp;nbsp;nbsp;(213) 480-9000/spanbr /br /span style=font-family: times new roman,times;Medical records reviewed br /__ pages of medical and administrative records were reviewed including:br /br /br /Criteria used in analysis br /nbsp;br /br /Reviewer comments br /br /br /Determinationbr /Based on the clinical information submitted for this review and using the evidence-based, peer-reviewed guidelines referenced above, this request isnbsp;br /br /Peer Reviewer Name/Credentialsnbsp;nbsp;/spanbr /span style=font-family: times new roman,times;Solis, Test,nbsp;PhD/spanbr /span style=font-family: times new roman,times;Internal Medicine/spanbr /span style=font-family: times new roman,times;nbsp;/spanbr /br /span style=font-family: times new roman,times;Attestationbr /br /br /Contact Information/spanspan style=font-family: times new roman,times;nbsp;br //span/span/pbr/font face=\'times new roman,times\' size=\'3\'Peer to Peer contact attempt 1: 08/13/2014 02:46 PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did Not Change Determination/font' i am trying to find the various font sizes and font face from this string. i tried print re.search(span style=\(.*)\, stmt).group() Thank you. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor Don't use regular expressions for HTML. Use lxml instead. Also, why would you need that exact thing? It's useless. Also, this code is very ugly, with too many spans and — worse — fonts which should not be used at all. -- Chris “Kwpolska” Warrick http://chriswarrick.com/ Sent from my SGS3. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module
- On Thu, Aug 14, 2014 4:07 PM CEST Chris “Kwpolska” Warrick wrote: On 14 Aug 2014 15:58 Sunil Tech sunil.tech...@gmail.com wrote: Hi, I have string like stmt = 'pspan style=font-size: 11pt;span style=font-family: times new roman,times;Patient name: Upadhyay Shyam/spanspan style=font-family: times new roman,times; br /Date of birth: 08/08/1988 br /Issue(s) to be analyzed: tes/span/spanbr /span style=font-size: 11pt;span style=font-family: times new roman,times;Nurse Clinical summary: test1/spanspan style=font-family: times new roman,times; br /br /Date of injury: 12/14/2013/spanbr /span style=font-family: times new roman,times;Diagnoses: 723.4 - 300.02 - 298.3 - 780.50 - 724.4 Brachial neuritis or radiculitis nos - Generalized anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance - Thoracic or lumbosacral neuritis or radiculitis, unspecified/spanbr /span style=font-family: times new roman,times;Requester name: Demo Spltycdtestt/spanbr /span style=font-family: times new roman,times;Phone #: (213) 480-9000/spanbr /br /span style=font-family: times new roman,times;Medical records reviewed br /__ pages of medical and administrative records were reviewed including:br /br /br /Criteria used in analysis br / br /br /Reviewer comments br /br /br /Determinationbr /Based on the clinical information submitted for this review and using the evidence-based, peer-reviewed guidelines referenced above, this request is br /br /Peer Reviewer Name/Credentials /spanbr /span style=font-family: times new roman,times;Solis, Test, PhD/spanbr /span style=font-family: times new roman,times;Internal Medicine/spanbr /span style=font-family: times new roman,times; /spanbr /br /span style=font-family: times new roman,times;Attestationbr /br /br /Contact Information/spanspan style=font-family: times new roman,times; br //span/span/pbr/font face=\'times new roman,times\' size=\'3\'Peer to Peer contact attempt 1: 08/13/2014 02:46 PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did Not Change Determination/font' i am trying to find the various font sizes and font face from this string. i tried print re.search(span style=\(.*)\, stmt).group() Thank you. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor Don't use regular expressions for HTML. Use lxml instead. Also, why would you need that exact thing? It's useless. Also, this code is very ugly, with too many spans and — worse — fonts which should not be used at all. Why lxml and not bs? I read that bs deals better with malformed html. You said the above html is messy, which is not necessarily the same as malformed, but.. Anyway, this reference also seems to favor lxml: http://stackoverflow.com/questions/4967103/beautifulsoup-and-lxml-html-what-to-prefer ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module
On Thu, Aug 14, 2014 at 8:39 AM, D.V.N.Sarma డి.వి.ఎన్.శర్మ dvnsa...@gmail.com wrote: I tested it on IDLE. It works. Hi Sarma, Following up on this one. I'm pretty sure that: print re.search(span style=\(.*)\, stmt).group() is going to print something, but it almost certainly will not do what Sunil wants. See: https://docs.python.org/2/howto/regex.html#greedy-versus-non-greedy for why. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module
Hi Sunil, Don't use regular expressions for this task. Use something that knows about HTML structure. As others have noted, the Beautiful Soup or lxml libraries are probably a much better choice here. There are good reasons to avoid regexp for the task you're trying to do. For example, your regular expression: span style=\(.*)\ does not respect the string boundaries of attributes. You may think that .* matches just content within a string attribute, but this is not true. For example, see the following example: ## import re m = re.match('(.*)', 'quoted' text, but note how it's greedy!) m.group(1) quoted' text, but note how it ## and note how the match doesn't limited itself to quoted, but goes as far as it can. This shows at least one of the problems that you're going to run into. Fixing this so it doesn't grab so much is doable, of course. But there are other issues, all of which are little headaches upon headaches. (e.g. Attribute vlaues may be single or double quoted, may use HTML entity references, etc.) So don't try to parse HTML by hand. Let a library do it for you. For example with Beautiful Soup: http://www.crummy.com/software/BeautifulSoup/bs4/doc/ the code should be as straightforward as: ### from bs4 import BeautifulSoup soup = BeautifulSoup(stmt) for span in soup.find_all('span'): print span.get('style') ### where you deal with the _structure_ of your document, rather than at the low-level individual characters of that document. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module- puzzling results when matching money
Hi, not quite. The moral is to learn about greedy and non-greedy matching ;)! -nik Alex Kleider aklei...@sonic.net schrieb: On 2013-08-03 13:38, Dominik George wrote: Hi, b is defined as all non-word characters, so it is the complement oft w. w is [A-Za-z0-9_-], so b includes $ and thus cuts off your sign group. -nik I get it now. I was using it before the '$' to define the beginning of a word but I think things are failing because it detects an end of word. Anyway, the moral is not to use it with anything but \w! Thanks! -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module- puzzling results when matching money
On 04/08/13 08:45, Alex Kleider wrote: sorry, my bad. I forgot to delete that backslash, I meant re.findall(r\be\b, d e f). Same with the other example. ..but the interesting thing is that the presence or absence of the spurious back slashes seems not to change the results. It wouldn't because the backslash says treat the next character as a literal and if its not a metacharacter its already treated as a literal. So the \ is effectively a non-operation in that context. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module- puzzling results when matching money
Hi, \b is defined as all non-word characters, so it is the complement oft \w. \w is [A-Za-z0-9_-], so \b includes \$ and thus cuts off your sign group. -nik Alex Kleider aklei...@sonic.net schrieb: #!/usr/bin/env python I've been puzzling over the re module and have a couple of questions regarding the behaviour of this script. I've provided two possible patterns (re_US_money): the one surrounded by the 'word boundary' meta sequence seems not to work while the other one does. I can't understand why the addition of the word boundary defeats the match. I also don't understand why the split method includes the matched text. Splitting only works as I would have expected if no goupings are used. If I've set this up as intended, the full body of this e-mail should be executable as a script. Comments appreciated. alex kleider # file : tutor.py (Python 2.7, NOT Python 3) print 'Running tutor.py on an Ubuntu Linux machine. *' import re target = \ Cost is $4.50. With a $.30 discount: Price is $4.15. The price could be less, say $4 or $4. Let's see how this plays out: $4.50.60 # Choose one of the following two alternatives: re_US_money =\ r((?Psign\$)(?Pdollars\d{0,})(?:\.(?Pcents\d{2})){0,1}) # The above provides matches. # The following does NOT. # re_US_money =\ # r\b((?Psign\$)(?Pdollars\d{0,})(?:\.(?Pcents\d{2})){0,1})\b pat_object = re.compile(re_US_money) match_object = pat_object.search(target) if match_object: print 'match_object.group()' and 'match_object.span()' yield: print match_object.group(), match_object.span() print else: print NO MATCH FOUND!!! print print Now will use 'finditer()': print iterator = pat_object.finditer(target) i = 1 for iter in iterator: print print iter #%d: %(i, ), print iter.group() print 'groups()' yields: '%s'.%(iter.groups(), ) print iter.span() i += 1 sign = iter.group(sign) dollars = iter.group(dollars) cents = iter.group(cents) print sign, print , if dollars: print dollars, else: print 00, print , if cents: print cents, else: print 00, print t = target sub_target = pat_object.sub(insert value here, t) print print Printing substitution: print sub_target split_target = pat_object.split(target) print Result of splitting on the target: print split_target # End of script. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] re module help
Hi Gurus, I have created regular expression with os modules, I have created file sdptool to match the regular expression pattern, will print the result. I want without creating file how to get required output, I tried but i didn't get output correctly, over stream. #! /usr/bin/python import os,re def scan(): cmd = sdptool -i hci0 search OPUSH sdptool fp = os.popen(cmd) results = [] l = open(sdptool).read() pattern = r^Searching for OPUSH on (\w\w(:\w\w)+).*?Channel: (\d+) r = re.compile(pattern, flags=re.MULTILINE|re.DOTALL) while True: for match in r.finditer(l): g = match.groups() results.append((g[0],'phone',g[2])) return results ## output [('00:15:83:3D:0A:57', 'phone', '1')] http://dpaste.com/684335/ please guide me. with out file creating, to archive required output. Did I learn something today? If not, I wasted it. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module help
You could use read directly on the popen call to negate having to write to a file output = os.popen(“sdptool -i hci0 search OPUSH“).read() Bodsda Sent from my BlackBerry® wireless device -Original Message- From: Ganesh Kumar bugcy...@gmail.com Sender: tutor-bounces+bodsda=googlemail@python.org Date: Mon, 9 Jan 2012 14:47:46 To: tutor@python.org Subject: [Tutor] re module help ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
Karim wrote: Recall: re.subn(r'([^\\])?', r'\1\\', expression) Traceback (most recent call last): File stdin, line 1, inmodule File /home/karim/build/python/install/lib/python2.7/re.py, line 162, in subn return _compile(pattern, flags).subn(repl, string, count) File /home/karim/build/python/install/lib/python2.7/re.py, line 278, in filter return sre_parse.expand_template(template, match) File /home/karim/build/python/install/lib/python2.7/sre_parse.py, line 787, in expand_template raise error, unmatched group sre_constants.error: unmatched group Found the solution: '?' needs to be inside parenthesis (saved pattern) because outside we don't know if the saved match argument will exist or not namely '\1'. re.subn(r'([^\\]?)', r'\1\\', expression) (' ', 2) sed unix command is more permissive: sed 's/\([^\\]\)\?/\1\\/g' because '?' can be outside parenthesis (saved pattern but escaped for sed). \1 seems to not cause issue when matching is found. Perhaps it is created only when match occurs. Thanks for reporting the explanation. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
Karim wrote: That is not the thing I want. I want to escape any which are not already escaped. The sed regex '/\([^\\]\)\?/\1\\/g' is exactly what I need (I have made regex on unix since 15 years). Can the backslash be escaped, too? If so I don't think your regex does what you think it does. r'\\\' # escaped \ followed by escaped should not be altered, but: $ echo '\\\' | sed 's/\([^\\]\)\?/\1\\/g' # two escaped \ folloed by a that is not escaped ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
On 02/04/2011 02:36 AM, Steven D'Aprano wrote: Karim wrote: *Indeed what's the matter with RE module!?* You should really fix the problem with your email program first; Thunderbird issue with bold type (appears as stars) but I don't know how to fix it yet. A man when to a doctor and said, Doctor, every time I do this, it hurts. What should I do? The doctor replied, Then stop doing that! :) Yes this these words made me laugh. I will keep it in my funny box. Don't add bold or any other formatting to things which should be program code. Even if it looks okay in *your* program, you don't know how it will look in other people's programs. If you need to draw attention to something in a line of code, add a comment, or talk about it in the surrounding text. [...] That is not the thing I want. I want to escape any which are not already escaped. The sed regex '/\([^\\]\)\?/\1\\/g' is exactly what I need (I have made regex on unix since 15 years). Mainly sed, awk and perl sometimes grep and egrep. I know this is the jungle. Which regex? Perl regexes? sed or awk regexes? Extended regexes? GNU posix compliant regexes? grep or egrep regexes? They're all different. In any case, I am sorry, I don't think your regex does what you say. When I try it, it doesn't work for me. [steve@sylar ~]$ echo 'Some \text' | sed -e 's/\([^\\]\)\?/\1\\/g' Some \\text\ I give you my word on this. Exact output I redid it: #MY OS VERSION karim@Requiem4Dream:~$ uname -a Linux Requiem4Dream 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 23:42:43 UTC 2011 x86_64 GNU/Linux #MY SED VERSION karim@Requiem4Dream:~$ sed --version GNU sed version 4.2.1 Copyright (C) 2009 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, to the extent permitted by law. GNU sed home page: http://www.gnu.org/software/sed/. General help using GNU software: http://www.gnu.org/gethelp/. E-mail bug reports to: bug-gnu-ut...@gnu.org. Be sure to include the word ``sed'' somewhere in the ``Subject:'' field. #MY SED OUTPUT COMMAND: karim@Requiem4Dream:~$ echo 'Some ' | sed -e 's/\([^\\]\)\?/\1\\/g' Some \\ # THIS IS WHAT I WANT 2 CONSECUTIVES IF THE FIRST ONE IS ALREADY ESCAPED I DON'T WANT TO ESCAPED IT TWICE. karim@Requiem4Dream:~$ echo 'Some \' | sed -e 's/\([^\\]\)\?/\1\\/g' Some \\ # BY THE WAY THIS ONE WORKS: karim@Requiem4Dream:~$ echo 'Some text' | sed -e 's/\([^\\]\)\?/\1\\/g' Some \text\ # BUT SURE NOT THIS ONE NOT COVERED BY MY REGEX (I KNOW IT AND WANT ORIGINALY TO COVER IT): karim@Requiem4Dream:~$ echo 'Some \text' | sed -e 's/\([^\\]\)\?/\1\\/g' Some \\text\ By the way in all sed version I work with the '?' (0 or one match) should be escaped that's the reason I have '\?' same thing with save '\(' and '\)' to store value. In perl, grep you don't need to escape. # SAMPLE FROM http://www.gnu.org/software/sed/manual/sed.html |\+| same As |*|, but matches one or more. It is a GNU extension. |\?| same As |*|, but only matches zero or one. It is a GNU extension I wouldn't expect it to work. See below. By the way, you don't need to escape the brackets or the question mark: [steve@sylar ~]$ echo 'Some \text' | sed -re 's/([^\\])?/\1\\/g' Some \\text\ For me the equivalent python regex is buggy: r'([^\\])?', r'\1\\' No it is not. Yes I know, see my latest post in detail I already found the solution. I put it again the solution below: #Found the solution: '?' needs to be inside parenthesis (saved pattern) because outside we don't know if the saved match argument #will exist or not namely '\1'. re.subn(r'([^\\]?)', r'\1\\', expression) (' ', 2) The pattern you are matching does not do what you think it does. Zero or one of not-backslash, followed by a quote will match a single quote *regardless* of what is before it. This is true even in sed, as you can see above, your sed regex matches both quotes. \ will match, because the regular expression will match zero characters, followed by a quote. So the regex is correct. match = r'[^\\]?' # zero or one not-backslash followed by quote re.search(match, r'aaa\aaa').group() '' Now watch what happens when you call re.sub: match = r'([^\\])?' # group 1 equals a single non-backslash replace = r'\1\\' # group 1 followed by \ followed by re.sub(match, replace, '') # no matches '' re.sub(match, replace, '') # one match 'aa\\aa' re.sub(match, replace, '') # one match, but there's no group 1 Traceback (most recent call last): File stdin, line 1, in module File /usr/local/lib/python3.1/re.py, line 166, in sub return _compile(pattern, flags).sub(repl, string, count) File /usr/local/lib/python3.1/re.py, line 303, in filter return sre_parse.expand_template(template, match) File /usr/local/lib/python3.1/sre_parse.py, line 807, in expand_template raise error(unmatched
Re: [Tutor] RE module is working ?
Hello, Any news on this topic?O:-) Regards Karim On 02/02/2011 08:21 PM, Karim wrote: Hello, I am trying to subsitute a '' pattern in '\\' namely escape 2 consecutives double quotes: * *In Python interpreter:* $ python Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. expression = *' '* re.subn(*r'([^\\])?', r'\1\\', expression*) Traceback (most recent call last): File stdin, line 1, in module File /home/karim/build/python/install/lib/python2.7/re.py, line 162, in subn return _compile(pattern, flags).subn(repl, string, count) File /home/karim/build/python/install/lib/python2.7/re.py, line 278, in filter return sre_parse.expand_template(template, match) File /home/karim/build/python/install/lib/python2.7/sre_parse.py, line 787, in expand_template raise error, unmatched group sre_constants.error: unmatched group But if I remove '?' I get the following: re.subn(r'([^\\])', r'\1\\', expression) (' \\ ', 1) Only one substitution..._But this is not the same REGEX._ And the count=2 does nothing. By default all occurrence shoul be substituted. * *On linux using my good old sed command, it is working with my '?' (0-1 match):* *$* echo *' '* | sed *'s/\([^\\]\)\?/\1\\/g*'* \\ *Indeed what's the matter with RE module!?* *Any idea will be welcome! Regards Karim* * ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
Karim wrote: Hello, I am trying to subsitute a '' pattern in '\\' namely escape 2 consecutives double quotes: You don't have to escape quotes. Just use the other sort of quote: print '' * *In Python interpreter:* $ python Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. expression = *' '* No, I'm sorry, that's incorrect -- that gives a syntax error in every version of Python I know of, including version 2.7: expression = *' '* File stdin, line 1 expression = *' '* ^ SyntaxError: invalid syntax So what are you really running? re.subn(*r'([^\\])?', r'\1\\', expression*) Likewise here. *r'...' is a syntax error, as is expression*) I don't understand what you are running or why you are getting the results you are. *Indeed what's the matter with RE module!?* There are asterisks all over your post! Where are they coming from? What makes you think the problem is with the RE module? We have a saying in English: The poor tradesman blames his tools. Don't you think it's more likely that the problem is that you are using the module wrongly? I don't understand what you are trying to do, so I can't tell you how to do it. Can you give an example of what you want to start with, and what you want to end up with? NOT Python code, just literal text, like you would type into a letter. E.g. ABC means literally A followed by B followed by C. \ means literally backslash followed by double-quote -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
Hello Steven, I am perhaps a poor tradesman but I have to blame my thunderbird tool :-P . Because expression = *' '* is in fact fact expression = ' '. The bold appear as stars I don't know why. I need to have escapes for passing it to another language (TCL interpreter). So I will rewrite it not _in bold_: $ python Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. expression = ' ' re.subn(r'([^\\])?', r'\1\\', expression) But if I remove '?' I get the following: re.subn(r'([^\\])', r'\1\\', expression) (' \\ ', 1) * On linux using my good old sed command, it is working with my '?' (0-1 match): $ echo ' ' | sed 's/\([^\\]\)\?/\1\\/g'* * \\ For me linux/unix sed utility is trusty and is the reference. Regards Karim On 02/03/2011 11:43 AM, Steven D'Aprano wrote: Karim wrote: Hello, I am trying to subsitute a '' pattern in '\\' namely escape 2 consecutives double quotes: You don't have to escape quotes. Just use the other sort of quote: print '' * *In Python interpreter:* $ python Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. expression = *' '* No, I'm sorry, that's incorrect -- that gives a syntax error in every version of Python I know of, including version 2.7: expression = *' '* File stdin, line 1 expression = *' '* ^ SyntaxError: invalid syntax So what are you really running? re.subn(*r'([^\\])?', r'\1\\', expression*) Likewise here. *r'...' is a syntax error, as is expression*) I don't understand what you are running or why you are getting the results you are. *Indeed what's the matter with RE module!?* There are asterisks all over your post! Where are they coming from? What makes you think the problem is with the RE module? We have a saying in English: The poor tradesman blames his tools. Don't you think it's more likely that the problem is that you are using the module wrongly? I don't understand what you are trying to do, so I can't tell you how to do it. Can you give an example of what you want to start with, and what you want to end up with? NOT Python code, just literal text, like you would type into a letter. E.g. ABC means literally A followed by B followed by C. \ means literally backslash followed by double-quote ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
I forget something. There is no issue with python and double quotes. But I need to give it to TCL script but as TCL is shit string is only delimited by double quotes. Thus I need to escape it to not have syntax error whith nested double quotes. Regards The poor tradesman On 02/03/2011 12:45 PM, Karim wrote: Hello Steven, I am perhaps a poor tradesman but I have to blame my thunderbird tool :-P . Because expression = *' '* is in fact fact expression = ' '. The bold appear as stars I don't know why. I need to have escapes for passing it to another language (TCL interpreter). So I will rewrite it not _in bold_: $ python Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. expression = ' ' re.subn(r'([^\\])?', r'\1\\', expression) But if I remove '?' I get the following: re.subn(r'([^\\])', r'\1\\', expression) (' \\ ', 1) * On linux using my good old sed command, it is working with my '?' (0-1 match): $ echo ' ' | sed 's/\([^\\]\)\?/\1\\/g'* * \\ For me linux/unix sed utility is trusty and is the reference. Regards Karim On 02/03/2011 11:43 AM, Steven D'Aprano wrote: Karim wrote: Hello, I am trying to subsitute a '' pattern in '\\' namely escape 2 consecutives double quotes: You don't have to escape quotes. Just use the other sort of quote: print '' * *In Python interpreter:* $ python Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. expression = *' '* No, I'm sorry, that's incorrect -- that gives a syntax error in every version of Python I know of, including version 2.7: expression = *' '* File stdin, line 1 expression = *' '* ^ SyntaxError: invalid syntax So what are you really running? re.subn(*r'([^\\])?', r'\1\\', expression*) Likewise here. *r'...' is a syntax error, as is expression*) I don't understand what you are running or why you are getting the results you are. *Indeed what's the matter with RE module!?* There are asterisks all over your post! Where are they coming from? What makes you think the problem is with the RE module? We have a saying in English: The poor tradesman blames his tools. Don't you think it's more likely that the problem is that you are using the module wrongly? I don't understand what you are trying to do, so I can't tell you how to do it. Can you give an example of what you want to start with, and what you want to end up with? NOT Python code, just literal text, like you would type into a letter. E.g. ABC means literally A followed by B followed by C. \ means literally backslash followed by double-quote ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
Karim wrote: I am trying to subsitute a '' pattern in '\\' namely escape 2 consecutives double quotes: * *In Python interpreter:* $ python Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. expression = *' '* re.subn(*r'([^\\])?', r'\1\\', expression*) Traceback (most recent call last): File stdin, line 1, in module File /home/karim/build/python/install/lib/python2.7/re.py, line 162, in subn return _compile(pattern, flags).subn(repl, string, count) File /home/karim/build/python/install/lib/python2.7/re.py, line 278, in filter return sre_parse.expand_template(template, match) File /home/karim/build/python/install/lib/python2.7/sre_parse.py, line 787, in expand_template raise error, unmatched group sre_constants.error: unmatched group But if I remove '?' I get the following: re.subn(r'([^\\])', r'\1\\', expression) (' \\ ', 1) Only one substitution..._But this is not the same REGEX._ And the count=2 does nothing. By default all occurrence shoul be substituted. * *On linux using my good old sed command, it is working with my '?' (0-1 match):* *$* echo *' '* | sed *'s/\([^\\]\)\?/\1\\/g*'* \\ *Indeed what's the matter with RE module!?* You should really fix the problem with your email program first; afterwards it's probably a good idea to try and explain your goal clearly, in plain English. Yes. What Steven said ;) Now to your question as stated: if you want to escape two consecutive double quotes that can be done with s = s.replace('', '\\') but that's probably *not* what you want. Assuming you want to escape two consecutive double quotes and make sure that the first one isn't already escaped, this is my attempt: def sub(m): ... s = m.group() ... return r'\\' if s == '' else s ... print re.compile(r'[\\].|').sub(sub, r'\\\ \\ \ \\\ \\ \') \\\ \ \\ \\\ \\ \ Compare that with $ echo '\\\ \\ \ \\\ \\ \' | sed 's/\([^\\]\)\?/\1\\/g' \\\ \\ \\ \\\ \\ Concerning the exception and the discrepancy between sed and python's re, I suggest that you ask it again on comp.lang.python aka the python-list mailing list where at least one regex guru will read it. Peter ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
On 02/03/2011 02:15 PM, Peter Otten wrote: Karim wrote: I am trying to subsitute a '' pattern in '\\' namely escape 2 consecutives double quotes: * *In Python interpreter:* $ python Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. expression = *' '* re.subn(*r'([^\\])?', r'\1\\', expression*) Traceback (most recent call last): File stdin, line 1, inmodule File /home/karim/build/python/install/lib/python2.7/re.py, line 162, in subn return _compile(pattern, flags).subn(repl, string, count) File /home/karim/build/python/install/lib/python2.7/re.py, line 278, in filter return sre_parse.expand_template(template, match) File /home/karim/build/python/install/lib/python2.7/sre_parse.py, line 787, in expand_template raise error, unmatched group sre_constants.error: unmatched group But if I remove '?' I get the following: re.subn(r'([^\\])', r'\1\\', expression) (' \\ ', 1) Only one substitution..._But this is not the same REGEX._ And the count=2 does nothing. By default all occurrence shoul be substituted. * *On linux using my good old sed command, it is working with my '?' (0-1 match):* *$* echo *' '* | sed *'s/\([^\\]\)\?/\1\\/g*'* \\ *Indeed what's the matter with RE module!?* You should really fix the problem with your email program first; Thunderbird issue with bold type (appears as stars) but I don't know how to fix it yet. afterwards it's probably a good idea to try and explain your goal clearly, in plain English. I already did it. (cf the mails queue). But to resume I pass the expression string to TCL command which delimits string with double quotes only. Indeed I get error with nested double quotes = That's the key problem. Yes. What Steven said ;) Now to your question as stated: if you want to escape two consecutive double quotes that can be done with s = s.replace('', '\\') I have already done it as a workaround but I have to add another replacement before to consider all other cases. I want to make the original command work to suppress the workaround. but that's probably *not* what you want. Assuming you want to escape two consecutive double quotes and make sure that the first one isn't already escaped, You hit it !:-) this is my attempt: def sub(m): ... s = m.group() ... return r'\\' if s == '' else s ... print re.compile(r'[\\].|').sub(sub, r'\\\ \\ \ \\\ \\ \') That is not the thing I want. I want to escape any which are not already escaped. The sed regex '/\([^\\]\)\?/\1\\/g' is exactly what I need (I have made regex on unix since 15 years). For me the equivalent python regex is buggy: r'([^\\])?', r'\1\\' '?' is not accepted Why? character which should not be an antislash with 0 or 1 occurence. This is quite simple. I am a poor tradesman but I don't deny evidence. Regards Karim \\\ \ \\ \\\ \\ \ Compare that with $ echo '\\\ \\ \ \\\ \\ \' | sed 's/\([^\\]\)\?/\1\\/g' \\\ \\ \\ \\\ \\ Concerning the exception and the discrepancy between sed and python's re, I suggest that you ask it again on comp.lang.python aka the python-list mailing list where at least one regex guru will read it. Peter ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
On 01/-10/-28163 02:59 PM, Karim wrote: On 02/03/2011 02:15 PM, Peter Otten wrote: Karim wrote: (snip *Indeed what's the matter with RE module!?* You should really fix the problem with your email program first; Thunderbird issue with bold type (appears as stars) but I don't know how to fix it yet. The simple fix is not to try to add bold or colors on a text message. Python-tutor is a text list, not an html one. Thunderbird tries to accomodate you by adding the asterisks, which is fine if it's regular English. But in program code, it's obviously confuses things. While I've got you, can I urge you not to top-post? In this message, you correctly added your remarks after the part you were quoting. But many times you put your comments at the top, which is backwards. DaveA -- -- da...@ieee.org ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
On 02/03/2011 11:20 PM, Dave Angel wrote: On 01/-10/-28163 02:59 PM, Karim wrote: On 02/03/2011 02:15 PM, Peter Otten wrote: Karim wrote: (snip *Indeed what's the matter with RE module!?* You should really fix the problem with your email program first; Thunderbird issue with bold type (appears as stars) but I don't know how to fix it yet. The simple fix is not to try to add bold or colors on a text message. Python-tutor is a text list, not an html one. Thunderbird tries to accomodate you by adding the asterisks, which is fine if it's regular English. But in program code, it's obviously confuses things. While I've got you, can I urge you not to top-post? In this message, you correctly added your remarks after the part you were quoting. But many times you put your comments at the top, which is backwards. DaveA Sorry Dave, I will try and do my best to avoid bold and top-post in the future. Regards Karim ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
On 02/03/2011 07:47 PM, Karim wrote: On 02/03/2011 02:15 PM, Peter Otten wrote: Karim wrote: I am trying to subsitute a '' pattern in '\\' namely escape 2 consecutives double quotes: * *In Python interpreter:* $ python Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. expression = *' '* re.subn(*r'([^\\])?', r'\1\\', expression*) Traceback (most recent call last): File stdin, line 1, inmodule File /home/karim/build/python/install/lib/python2.7/re.py, line 162, in subn return _compile(pattern, flags).subn(repl, string, count) File /home/karim/build/python/install/lib/python2.7/re.py, line 278, in filter return sre_parse.expand_template(template, match) File /home/karim/build/python/install/lib/python2.7/sre_parse.py, line 787, in expand_template raise error, unmatched group sre_constants.error: unmatched group But if I remove '?' I get the following: re.subn(r'([^\\])', r'\1\\', expression) (' \\ ', 1) Only one substitution..._But this is not the same REGEX._ And the count=2 does nothing. By default all occurrence shoul be substituted. * *On linux using my good old sed command, it is working with my '?' (0-1 match):* *$* echo *' '* | sed *'s/\([^\\]\)\?/\1\\/g*'* \\ *Indeed what's the matter with RE module!?* You should really fix the problem with your email program first; Thunderbird issue with bold type (appears as stars) but I don't know how to fix it yet. afterwards it's probably a good idea to try and explain your goal clearly, in plain English. I already did it. (cf the mails queue). But to resume I pass the expression string to TCL command which delimits string with double quotes only. Indeed I get error with nested double quotes = That's the key problem. Yes. What Steven said ;) Now to your question as stated: if you want to escape two consecutive double quotes that can be done with s = s.replace('', '\\') I have already done it as a workaround but I have to add another replacement before to consider all other cases. I want to make the original command work to suppress the workaround. but that's probably *not* what you want. Assuming you want to escape two consecutive double quotes and make sure that the first one isn't already escaped, You hit it !:-) this is my attempt: def sub(m): ... s = m.group() ... return r'\\' if s == '' else s ... print re.compile(r'[\\].|').sub(sub, r'\\\ \\ \ \\\ \\ \') That is not the thing I want. I want to escape any which are not already escaped. The sed regex '/\([^\\]\)\?/\1\\/g' is exactly what I need (I have made regex on unix since 15 years). For me the equivalent python regex is buggy: r'([^\\])?', r'\1\\' '?' is not accepted Why? character which should not be an antislash with 0 or 1 occurence. This is quite simple. I am a poor tradesman but I don't deny evidence. Recall: re.subn(r'([^\\])?', r'\1\\', expression) Traceback (most recent call last): File stdin, line 1, inmodule File /home/karim/build/python/install/lib/python2.7/re.py, line 162, in subn return _compile(pattern, flags).subn(repl, string, count) File /home/karim/build/python/install/lib/python2.7/re.py, line 278, in filter return sre_parse.expand_template(template, match) File /home/karim/build/python/install/lib/python2.7/sre_parse.py, line 787, in expand_template raise error, unmatched group sre_constants.error: unmatched group Found the solution: '?' needs to be inside parenthesis (saved pattern) because outside we don't know if the saved match argument will exist or not namely '\1'. re.subn(r'([^\\]?)', r'\1\\', expression) (' ', 2) sed unix command is more permissive: sed 's/\([^\\]\)\?/\1\\/g' because '?' can be outside parenthesis (saved pattern but escaped for sed). \1 seems to not cause issue when matching is found. Perhaps it is created only when match occurs. MORALITY: 1) Behaviour of python is logic and I must understand what I do with it. 2) sed is a fantastic tool because it manages match value when missing. 3) I am a real poor tradesman Regards Karim Regards Karim \\\ \ \\ \\\ \\ \ Compare that with $ echo '\\\ \\ \ \\\ \\ \' | sed 's/\([^\\]\)\?/\1\\/g' \\\ \\ \\ \\\ \\ Concerning the exception and the discrepancy between sed and python's re, I suggest that you ask it again on comp.lang.python aka the python-list mailing list where at least one regex guru will read it. Peter ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor
Re: [Tutor] RE module is working ?
Karim karim.liat...@free.fr wrote Because expression = *' '* is in fact fact expression = ' '. The bold appear as stars I don't know why. Because in the days when email was always sent in plain ASCII text the way to show bold was to put asterisks around it. Underlining used _underscores_ like so... Obviously somebody decided that Thunderbird would stick with those conventions when translating HTML to text :-) Quite smart really :-) Alan G. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE module is working ?
Karim wrote: *Indeed what's the matter with RE module!?* You should really fix the problem with your email program first; Thunderbird issue with bold type (appears as stars) but I don't know how to fix it yet. A man when to a doctor and said, Doctor, every time I do this, it hurts. What should I do? The doctor replied, Then stop doing that! :) Don't add bold or any other formatting to things which should be program code. Even if it looks okay in *your* program, you don't know how it will look in other people's programs. If you need to draw attention to something in a line of code, add a comment, or talk about it in the surrounding text. [...] That is not the thing I want. I want to escape any which are not already escaped. The sed regex '/\([^\\]\)\?/\1\\/g' is exactly what I need (I have made regex on unix since 15 years). Which regex? Perl regexes? sed or awk regexes? Extended regexes? GNU posix compliant regexes? grep or egrep regexes? They're all different. In any case, I am sorry, I don't think your regex does what you say. When I try it, it doesn't work for me. [steve@sylar ~]$ echo 'Some \text' | sed -e 's/\([^\\]\)\?/\1\\/g' Some \\text\ I wouldn't expect it to work. See below. By the way, you don't need to escape the brackets or the question mark: [steve@sylar ~]$ echo 'Some \text' | sed -re 's/([^\\])?/\1\\/g' Some \\text\ For me the equivalent python regex is buggy: r'([^\\])?', r'\1\\' No it is not. The pattern you are matching does not do what you think it does. Zero or one of not-backslash, followed by a quote will match a single quote *regardless* of what is before it. This is true even in sed, as you can see above, your sed regex matches both quotes. \ will match, because the regular expression will match zero characters, followed by a quote. So the regex is correct. match = r'[^\\]?' # zero or one not-backslash followed by quote re.search(match, r'aaa\aaa').group() '' Now watch what happens when you call re.sub: match = r'([^\\])?' # group 1 equals a single non-backslash replace = r'\1\\' # group 1 followed by \ followed by re.sub(match, replace, '') # no matches '' re.sub(match, replace, '') # one match 'aa\\aa' re.sub(match, replace, '') # one match, but there's no group 1 Traceback (most recent call last): File stdin, line 1, in module File /usr/local/lib/python3.1/re.py, line 166, in sub return _compile(pattern, flags).sub(repl, string, count) File /usr/local/lib/python3.1/re.py, line 303, in filter return sre_parse.expand_template(template, match) File /usr/local/lib/python3.1/sre_parse.py, line 807, in expand_template raise error(unmatched group) sre_constants.error: unmatched group Because group 1 was never matched, Python's re.sub raised an error. It is not a very informative error, but it is valid behaviour. If I try the same thing in sed, I get something different: [steve@sylar ~]$ echo 'Some text' | sed -re 's/([^\\])?/\1\\/g' \Some text It looks like this version of sed defines backreferences on the right-hand side to be the empty string, in the case that they don't match at all. But this is not standard behaviour. The sed FAQs say that this behaviour will depend on the version of sed you are using: Seds differ in how they treat invalid backreferences where no corresponding group occurs. http://sed.sourceforge.net/sedfaq3.html So you can't rely on this feature. If it works for you, great, but it may not work for other people. When you delete the ? from the Python regex, group 1 is always valid, and you don't get an exception. Or if you ensure the input always matches group 1, no exception: match = r'([^\\])?' replace = r'\1\\' re.sub(match, replace, '') # group 1 always matches 'a\\a\\a\\a' (It still won't do what you want, but that's a *different* problem.) Jamie Zawinski wrote: Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems. How many hours have you spent trying to solve this problem using regexes? This is a *tiny* problem that requires an easy solution, not wrestling with a programming language that looks like line-noise. This should do what you ask for: def escape(text): Escape any double-quote characters if and only if they aren't already escaped. output = [] escaped = False for c in text: if c == '' and not escaped: output.append('\\') elif c == '\\': output.append('\\') escaped = True continue output.append(c) escaped = False return ''.join(output) Armed with this helper function, which took me two minutes to write, I can do this: text = 'Some text with backslash-quotes \\ and plain quotes together.' print escape(text) Some text with backslash-quotes \ and plain quotes \ together. Most problems that people turn to regexes are best solved
[Tutor] RE module is working ?
Hello, I am trying to subsitute a '' pattern in '\\' namely escape 2 consecutives double quotes: * *In Python interpreter:* $ python Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. expression = *' '* re.subn(*r'([^\\])?', r'\1\\', expression*) Traceback (most recent call last): File stdin, line 1, in module File /home/karim/build/python/install/lib/python2.7/re.py, line 162, in subn return _compile(pattern, flags).subn(repl, string, count) File /home/karim/build/python/install/lib/python2.7/re.py, line 278, in filter return sre_parse.expand_template(template, match) File /home/karim/build/python/install/lib/python2.7/sre_parse.py, line 787, in expand_template raise error, unmatched group sre_constants.error: unmatched group But if I remove '?' I get the following: re.subn(r'([^\\])', r'\1\\', expression) (' \\ ', 1) Only one substitution..._But this is not the same REGEX._ And the count=2 does nothing. By default all occurrence shoul be substituted. * *On linux using my good old sed command, it is working with my '?' (0-1 match):* *$* echo *' '* | sed *'s/\([^\\]\)\?/\1\\/g*'* \\ *Indeed what's the matter with RE module!?* *Any idea will be welcome! Regards Karim* * ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] re module / separator
Hi! I am trying to split some lists out of a single text file, and I am having a hard time. I have reduced the problem to the following one: text = a2345b. f325. a45453b. a325643b. a435643b. g234324b. Of this line of text, I want to take out strings where all words start with a, end with b.. But I don't want a list of words. I want that: [a2345b., a45453b. a325643b. a435643b.] And I feel I still don't fully understand regular expression's logic. I do not understand the results below: In [33]: re.search((a[^.]*?b\.\s?){2}, text).group(0) Out[33]: 'a45453b. a325643b. ' In [34]: re.findall((a[^.]*?b\.\s?){2}, text) Out[34]: ['a325643b. '] In [35]: re.search((a[^.]*?b\.\s?)+, text).group(0) Out[35]: 'a2345b. ' In [36]: re.findall((a[^.]*?b\.\s?)+, text) Out[36]: ['a2345b. ', 'a435643b. '] What's the difference between search and findall in [33-34]? And why I cannot generalize [33] to [35]? Out[35] would make sense to me if I had put a non-greedy +, but why do re gets only one word? Thanks, Tiago Saboga. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module / separator
Hey Tiago, text = a2345b. f325. a45453b. a325643b. a435643b. g234324b. Of this line of text, I want to take out strings where all words start with a, end with b.. But I don't want a list of words. I want that: [a2345b., a45453b. a325643b. a435643b.] Are you saying you want a list of every item that starts with an a and ends with a b? If so, the above list is not what you're after. It only contains two items: a2345b. a45453b. a325643b. a435643b. You can verify this by trying len([a2345b., a45453b. a325643b. a435643b.]). You can also see that each item is wrapped in double quotes and separated by a comma. And I feel I still don't fully understand regular expression's logic. I do not understand the results below: Try reading this: http://www.amk.ca/python/howto/regex/ I've found it to be a very gentle and useful introduction to regexes. It explains, among other things, what the search and findall methods do. If I'm understanding your problem correctly, you probably want the findall method: You should definitely take the time to read up on regexes. Your patterns grew too complex for this problem (again, if I'm understanding you right) which is probably why you're not understanding your results. In [9]: re.findall(r'a[a-z0-9]+b',text) Out[9]: ['a2345b', 'a45453b', 'a325643b', 'a435643b'] There are other ways to perform the above, for instance using the \w metacharacter to match any alphanumeric. In [20]: re.findall(r'a\w+b',text) Out[20]: ['a2345b', 'a45453b', 'a325643b', 'a435643b'] Or, to get even more (needlessly) complicated: In [21]: re.findall(r'\ba\w+b\b',text) Out[21]: ['a2345b', 'a45453b', 'a325643b', 'a435643b'] As you learned, regexes can get really complicated, really quickly if you don't understand the syntax. Others with more experience might offer more elegant solutions to your problem, but I'd still encourage you to read up on the basics and get comfortable with the re module. It's a great tool once you understand it. Best of luck, Serdar ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module / separator
Serdar Tumgoren zstumgo...@gmail.com writes: Hey Tiago, text = a2345b. f325. a45453b. a325643b. a435643b. g234324b. Of this line of text, I want to take out strings where all words start with a, end with b.. But I don't want a list of words. I want that: [a2345b., a45453b. a325643b. a435643b.] Are you saying you want a list of every item that starts with an a and ends with a b? If so, the above list is not what you're after. It only contains two items: a2345b. a45453b. a325643b. a435643b. Yes, I want to find only two items. I want every sequence of words where every word begins with an a and ends with b.. Try reading this: http://www.amk.ca/python/howto/regex/ I have read several times, and I thought I understood it quite well ;) I have not the time right now to do it, but if it turns out to be useful, I can show why I came to the patterns I sent to the list. Thanks, Tiago. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module / separator
On Wed, Jun 24, 2009 at 2:24 PM, Tiago Sabogatiagosab...@gmail.com wrote: Hi! I am trying to split some lists out of a single text file, and I am having a hard time. I have reduced the problem to the following one: text = a2345b. f325. a45453b. a325643b. a435643b. g234324b. Of this line of text, I want to take out strings where all words start with a, end with b.. But I don't want a list of words. I want that: [a2345b., a45453b. a325643b. a435643b.] And I feel I still don't fully understand regular expression's logic. I do not understand the results below: In [33]: re.search((a[^.]*?b\.\s?){2}, text).group(0) Out[33]: 'a45453b. a325643b. ' group(0) is the entire match so this returns what you expect. But what is group(1)? In [6]: re.search((a[^.]*?b\.\s?){2}, text).group(1) Out[6]: 'a325643b. ' Repeated groups are tricky; the returned value contains only the first match for the group, not the repeats. In [34]: re.findall((a[^.]*?b\.\s?){2}, text) Out[34]: ['a325643b. '] When the re contains groups, re.findall() returns the groups. It doesn't return the whole match. So this is giving group(1), not group(0). You can get the whole match by explicitly grouping it: In [4]: re.findall(((a[^.]*?b\.\s?){2}), text) Out[4]: [('a45453b. a325643b. ', 'a325643b. ')] In [35]: re.search((a[^.]*?b\.\s?)+, text).group(0) Out[35]: 'a2345b. ' You only get the first match, so this is correct. In [36]: re.findall((a[^.]*?b\.\s?)+, text) Out[36]: ['a2345b. ', 'a435643b. '] This is finding both matches but the grouping has the same difficulty as the previous findall(). This is closer: In [7]: re.findall(((a[^.]*?b\.\s?)+), text) Out[7]: [('a2345b. ', 'a2345b. '), ('a45453b. a325643b. a435643b. ', 'a435643b. ')] If you change the inner parentheses to be non-grouping then you get pretty much what you want: In [8]: re.findall(((?:a[^.]*?b\.\s?)+), text) Out[8]: ['a2345b. ', 'a45453b. a325643b. a435643b. '] Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module / separator
As usual, Kent Johnson has swooped in an untangled the mess with a clear explanation. By the time a regex gets this complicated, I typically start thinking of ways to simplify or avoid them altogether. Below is the code I came up with. It goes through some gymnastics and can surely stand improvement, but it seems to get the job done. Suggestions are welcome. In [83]: text Out[83]: 'a2345b. f325. a45453b. a325643b. a435643b. g234324b.' In [84]: textlist = text.split() In [85]: textlist Out[85]: ['a2345b.', 'f325.', 'a45453b.', 'a325643b.', 'a435643b.', 'g234324b.'] In [86]: newlist = [] In [87]: pat = re.compile(r'a\w+b\.') In [88]: for item in textlist: : if pat.match(item): : newlist.append(item) : else: : newlist.append(|) : : In [89]: newlist Out[89]: ['a2345b.', '|', 'a45453b.', 'a325643b.', 'a435643b.', '|'] In [90]: lastlist = ''.join(newlist) In [91]: lastlist Out[91]: 'a2345b.|a45453b.a325643b.a435643b.|' In [92]: lastlist.rstrip(|).split(|) Out[92]: ['a2345b.', 'a45453b.a325643b.a435643b.'] ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re module / separator
Ok -- realized my solution incorrectly strips white space from multiword strings: Out[92]: ['a2345b.', 'a45453b.a325643b.a435643b.'] So here are some more gymnastics to get the correct result: In [105]: newlist Out[105]: ['a2345b.', '|', 'a45453b.', 'a325643b.', 'a435643b.', '|'] In [109]: lastlist2 = .join(newlist).rstrip(|).split(|) In [110]: lastlist3 = [item.strip() for item in lastlist2] In [111]: lastlist3 Out[111]: ['a2345b.', 'a45453b. a325643b. a435643b.'] ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Re: Module Loop doesn't work (Joseph Q.)
Joseph Quigley wrote on Fri, 01 Apr 2005 10:07:08 -0600: I have some code on a geek dictionary that I'm making where the command geeker() opens a module for the real geek dictionary (where you can type a word to see what it is geekified). Supposedly, you type lobby() to go back to what I call the lobby (where you can get info on the web site and email and version). But it just loops back to the Geeker prompt where you type the word that you want geekified. I even tried having it restart the whole program by importing the index module that I wrote. But it still won't restart the program! Without seeing your code, I doubt anyone will be able to solve your problem except by pure chance. In addition to that, I'm confused by the use of function calls in what seems te be essentially a menu system. Speaking in general terms, the way you could handle this is as follows: - have a main menu loop (what you call the lobby) which accepts user input and based on that input calls other functions which perform certain tasks (e.g. open a webpage or go to the dictionary part) - the dictionary part would in turn be another loop accepting words as input which get 'translated', until the user gives a blank string or whatever as input in order to terminate the loop (and automatically fall back into the loop of the lobby) -- Yours, Andrei = Real contact info (decode with rot13): [EMAIL PROTECTED] Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq gur yvfg, fb gurer'f ab arrq gb PP. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor