On Apr 27, 8:50 am, Paul McGuire <[EMAIL PROTECTED]> wrote: > On Apr 27, 9:10 am, proctor <[EMAIL PROTECTED]> wrote: > > > > > On Apr 27, 1:33 am, Paul McGuire <[EMAIL PROTECTED]> wrote: > > > > On Apr 27, 1:33 am, proctor <[EMAIL PROTECTED]> wrote: > > > > > hello, > > > > > i have a regex: rx_test = re.compile('/x([^x])*x/') > > > > > which is part of this test program: > > > > > ============ > > > > > import re > > > > > rx_test = re.compile('/x([^x])*x/') > > > > > s = '/xabcx/' > > > > > if rx_test.findall(s): > > > > print rx_test.findall(s) > > > > > ============ > > > > > i expect the output to be ['abc'] however it gives me only the last > > > > single character in the group: ['c'] > > > > > C:\test>python retest.py > > > > ['c'] > > > > > can anyone point out why this is occurring? i can capture the entire > > > > group by doing this: > > > > > rx_test = re.compile('/x([^x]+)*x/') > > > > but why isn't the 'star' grabbing the whole group? and why isn't each > > > > letter 'a', 'b', and 'c' present, either individually, or as a group > > > > (group is expected)? > > > > > any clarification is appreciated! > > > > > sincerely, > > > > proctor > > > > As Josiah already pointed out, the * needs to be inside the grouping > > > parens. > > > > Since re's do lookahead/backtracking, you can also write: > > > > rx_test = re.compile('/x(.*?)x/') > > > > The '?' is there to make sure the .* repetition stops at the first > > > occurrence of x/. > > > > -- Paul > > > i am working through an example from the oreilly book mastering > > regular expressions (2nd edition) by jeffrey friedl. my post was a > > snippet from a regex to match C comments. every 'x' in the regex > > represents a 'star' in actual usage, so that backslash escaping is not > > needed in the example (on page 275). it looks like this: > > > =========== > > > /x([^x]|x+[^/x])*x+/ > > > it is supposed to match '/x', the opening delimiter, then > > > ( > > either anything that is 'not x', > > > or, > > > 'x' one or more times, 'not followed by a slash or an x' > > ) any number of times (the 'star') > > > followed finally by the closing delimiter. > > > =========== > > > this does not seem to work in python the way i understand it should > > from the book, and i simplified the example in my first post to > > concentrate on just one part of the alternation that i felt was not > > acting as expected. > > > so my question remains, why doesn't the star quantifier seem to grab > > all the data. isn't findall() intended to return all matches? i > > would expect either 'abc' or 'a', 'b', 'c' or at least just > > 'a' (because that would be the first match). why does it give only > > one letter, and at that, the /last/ letter in the sequence?? > > > thanks again for replying! > > > sincerely, > > proctor- Hide quoted text - > > > - Show quoted text - > > Again, I'll repeat some earlier advice: you need to move the '*' > inside the parens - you are still leaving it outside. Also, get in > the habit of using raw literal notation (that is r"slkjdfljf" instead > of "lsjdlfkjs") when defining re strings - you don't have backslash > issues yet, but you will as soon as you start putting real '*' > characters in your expression. > > However, when I test this, > > restr = r'/x(([^x]|x+[^/])*)x+/' > re_ = re.compile(restr) > print re_.findall("/xabxxcx/ /x123xxx/") > > findall now starts to give a tuple for each "comment", > > [('abxxc', 'xxc'), ('123xx', 'xx')] > > so you have gone beyond my limited re skill, and will need help from > someone else. > > But I suggest you add some tests with multiple consecutive 'x' > characters in the middle of your comment, and multiple consecutive 'x' > characters before the trailing comment. In fact, from my > recollections of trying to implement this type of comment recognizer > by hand a long time ago in a job far, far away, test with both even > and odd numbers of 'x' characters. > > -- Paul
thanks paul, the reason the regex now give tuples is that there are now 2 groups, the inner and outer parens. so group 1 matches with the star, and group 2 matches without the star. sincerely, proctor -- http://mail.python.org/mailman/listinfo/python-list