That worked. Thank you again :) Victor On Mon, Aug 3, 2009 at 12:13 AM, Gabriel Genellina <gagsl-...@yahoo.com.ar>wrote:
> En Sun, 02 Aug 2009 18:22:20 -0300, Victor Subervi < > victorsube...@gmail.com> escribió: > > > How do I search and replace something like this: >> aLine = re.sub('[<]?[p]?[>]?<font size="h' + str(x) + '"[ >> a-zA-Z0-9"\'=:]*>[<]?[b]?[>]?', '<h' + str(x) + '>', aLine) >> where RE *only* looks for the possibility of "<p>" at the beginning of the >> string; that is, not the individual components as I have it coded above, >> but >> the entire 3-character block? >> > > An example would make it more clear; I think you want to match either > "<p><font size=...." or "<font size=....". In other words, "<p>" is > optional. Use a normal group or a non-capturing group: > r'(<p>)?<font size="...' > r'(?:<p>)?<font size="...' > > That said, using regular expressions to parse HTML or XML is terribly > fragile; I'd use a specific tool (like BeautifulSoup, ElementTree, or lxml) > > -- > Gabriel Genellina > > -- > http://mail.python.org/mailman/listinfo/python-list >
-- http://mail.python.org/mailman/listinfo/python-list