Fredrik Lundh wrote: > Steven Bethard wrote: >> I feel like there should be a simpler solution (maybe with the re >> module?) but I can't figure one out. Any suggestions? > > using the finditer pattern I just posted in another thread: > > tokens = ['She', "'s", 'gon', 'na', 'write', 'a', 'book', '?'] > text = '''\ > She's gonna write > a book?''' > > import re > > tokens.sort() # lexical order > tokens.reverse() # look for longest match first > pattern = "|".join(map(re.escape, tokens)) > pattern = re.compile(pattern) > > I get > > print [m.span() for m in pattern.finditer(text)] > [(0, 3), (3, 5), (6, 9), (9, 11), (12, 17), (18, 19), (20, 24), (24, 25)] > > which seems to match your version pretty well.
That's what I was looking for. Thanks! STeVe -- http://mail.python.org/mailman/listinfo/python-list