<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hello, > > I cannot figure out a way to find a regular expression that would > match one and only one of these two strings: > > s1 = ' how are you' > s2 = ' hello world how are you' > > All I could come up with was: > patt = re.compile('^[ ]*([A-Za-z]+)[ ]+([A-Za-z]+)$') > > Which of course does not work. I cannot express the fact: sentence > have 0 or 1 whitespace, separation of group have two or more > whitespaces. > > Any suggestion ? Thanks a bunch ! > Mathieu > A pyparsing approach is not as terse as regexp's, but it's not terribly long either. Following John Machin's submission as a pattern:
s1 = ' how are you' s2 = ' hello world how are you' from pyparsing import * wd = Word(printables) # this is necessary to suppress pyparsing's built-in whitespace skipping wd.leaveWhitespace() sentence = delimitedList(wd, delim=White(' ',exact=1)) for test in (s1,s2): print sentence.searchString(test) Pyparsing returns data as ParseResults objects, which can be accessed as lists or dicts. From this first cut, we get: [['how', 'are', 'you']] [['hello', 'world'], ['how', 'are', 'you']] These aren't really sentences any more, but we can have pyparsing put them back into sentences, by adding a parse action to sentence. sentence.setParseAction(lambda toks: " ".join(toks)) Now our results are: [['how are you']] [['hello world'], ['how are you']] If you really want to get fancy, and clean up some of that capitalization and lack of punctuation, you can add a more elaborate parse action instead: ispunc = lambda s: s in ".!?;:," sixLoyalServingMen = ('What','Why','When','How','Where','Who') def cleanup(t): t[0] = t[0].title() if not ispunc( t[-1][-1] ): if t[0] in sixLoyalServingMen: punc = "?" else: punc = "." else: punc = "" return " ".join(t) + punc sentence.setParseAction(cleanup) This time we get: [['How are you?']] [['Hello world.'], ['How are you?']] The pyparsing home page is at pyparsing.wikispaces.com. -- Paul -- http://mail.python.org/mailman/listinfo/python-list