On Thu, 7 Apr 2005, Danny Yoo wrote: > > > On Wed, 6 Apr 2005, Kent Johnson wrote: > > > >>>>s = 'Hi "Python Tutors" please help' > > >>>>s.split() > > > > > > ['Hi', '"Python', 'Tutors"', 'please', 'help'] > > > > > > > > > I wish it would leave the stuff in quotes in tact: > > > > > > ['Hi', '"Python Tutors"', 'please', 'help'] > > > > You can do this easily with the csv module. The only complication is > > that the string has to be wrapped in a StringIO to turn it into a > > file-like object. > > > Hello! > > A variation of Kent's approach might be to use the 'tokenize' module: > > http://www.python.org/doc/lib/module-tokenize.html > > which takes advantage of Python's tokenizer itself to break lines into > chunks of tokens. If you intend your input to be broken up just like > Python tokens, the 'tokenize' module might be ok: > > ###### > >>> import tokenize > >>> from StringIO import StringIO > >>> def getListOfTokens(s): > ... results = [] > ... for tokenTuple in tokenize.generate_tokens(StringIO(s).readline): > ... results.append(tokenTuple[1]) > ... return results > ... > >>> getListOfTokens('Hi "Python Tutors" please help') > ['Hi', '"Python Tutors"', 'please', 'help', ''] > ###### > > (The last token, the empty string, is EOF, which can be filtered out if we > use the token.ISEOF() function.) >
In my context, I expect exactly 8 tokens so the extra '' wouldn't be noticed. > > I'm not sure if this is appropriate for Marilyn's purposes though, but I > thought I might just toss it out. *grin* Thank you Danny. Very interesting. Both approaches are perfect for me. Is there a reason to prefer one over the other? Is one faster? I compiled my regular expression to make it quicker. What a rich language! So many choices. Marilyn > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > -- _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor