"conan" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > This regexp > '<widget class=".*" id=".*">' > > works well with 'grep' for matching lines of the kind > <widget class="GtkWindow" id="window1"> > > on a XML .glade file >
As Peter Otten has already mentioned, this is the difference between the re "match" and "search" methods. As purely a lateral exercise, here is a pyparsing rendition of your program: ------------------------------------ from pyparsing import makeXMLTags, line # define pyparsing patterns for begin and end XML tags widgetStart,widgetEnd = makeXMLTags("widget") # read the file contents glade_file_name = 'some.glade' gladeContents = open(glade_file_name).read() # scan the input string for matching tags for widget,start,end in widgetStart.scanString(gladeContents): print "good:", line(start, gladeContents).strip() print widget["class"], widget["id"] print "Class: %(class)s; Id: %(id)s" % widget ------------------------------------ Not quite an exact match, only the good lines get listed. But also check out some of the other capabilities. To do this with re's, you have to clutter up the re expression with field names, as in: (r'<widget class=(?P<class>".*") id="(?P<id>.*)">') The parsing patterns generated by makeXMLTags give dict-like and attribute-like access to any attributes included with the tag. If not for the unfortunate attribute name "class" (which is a Python keyword), you could also reference these values as widget.class and widget.id. If you are parsing HTML, there is also a makeHTMLTags method, which creates patterns that are less rigid about upper/lower case and other XML strictnesses. -- Paul -- http://mail.python.org/mailman/listinfo/python-list