"Paddy" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > It's difficult to switch to parsers for me even though examples like > pyparsing seem readable, I do want to skip what I am not interested in > rather than having to write a parser for everything. But converely, > when something skipped does bite me - I want to be able to easily add > it in. > > Are their any examples of this kind of working with parsers? >
pyparsing offers several flavors of skipping over uninteresting text. The most obvious is scanString. scanString is a generator function that scans through the input text looking for pattern matches (multiple patterns can be OR'ed together) - when a match is found, the matching tokens, start, and end locations are yielded. Here's a short example that ships with pyparsing: from pyparsing import Word, alphas, alphanums, Literal, restOfLine, OneOrMore, Empty # simulate some C++ code testData = """ #define MAX_LOCS=100 #define USERNAME = "floyd" #define PASSWORD = "swordfish" a = MAX_LOCS; CORBA::initORB("xyzzy", USERNAME, PASSWORD ); """ ################# print "Example of an extractor" print "----------------------" # simple grammar to match #define's ident = Word(alphas, alphanums+"_") macroDef = Literal("#define") + ident.setResultsName("name") + "=" + restOfLine.setResultsName("value") for t,s,e in macroDef.scanString( testData ): print t.name,":", t.value # or a quick way to make a dictionary of the names and values macros = dict([(t.name,t.value) for t,s,e in macroDef.scanString(testData)]) print "macros =", macros print -------------------- prints: Example of an extractor ---------------------- MAX_LOCS : 100 USERNAME : "floyd" PASSWORD : "swordfish" macros = {'USERNAME': '"floyd"', 'PASSWORD': '"swordfish"', 'MAX_LOCS': '100'} Note that scanString worked only with the expressions we defined, and ignored pretty much everything else. scanString has a companion method, transformString. transformString calls scanString internally - the purpose is to apply any parse actions or suppressions on the matched tokens, substitute them back in for the original text, and then return the transformed string. Here are two transformer examples, one uses the macros dictionary we just created, and does simple macro substitution; the other converts C++-namespaced references to C-compatible global symbols (something we had to do in the early days of CORBA): ################# print "Examples of a transformer" print "----------------------" # convert C++ namespaces to mangled C-compatible names scopedIdent = ident + OneOrMore( Literal("::").suppress() + ident ) scopedIdent.setParseAction(lambda s,l,t: "_".join(t)) print "(replace namespace-scoped names with C-compatible names)" print scopedIdent.transformString( testData ) # or a crude pre-processor (use parse actions to replace matching text) def substituteMacro(s,l,t): if t[0] in macros: return macros[t[0]] ident.setParseAction( substituteMacro ) ident.ignore(macroDef) print "(simulate #define pre-processor)" print ident.transformString( testData ) -------------------------- prints: Examples of a transformer ---------------------- (replace namespace-scoped names with C-compatible names) #define MAX_LOCS=100 #define USERNAME = "floyd" #define PASSWORD = "swordfish" a = MAX_LOCS; CORBA_initORB("xyzzy", USERNAME, PASSWORD ); (simulate #define pre-processor) #define MAX_LOCS=100 #define USERNAME = "floyd" #define PASSWORD = "swordfish" a = 100; CORBA::initORB("xyzzy", "floyd", "swordfish" ); I'd say it took me about 8 weeks to develop a complete Verilog parser using pyparsing, so I can sympathize that you wouldn't want to write a complete parser for it. But the individual elements are pretty straightforward, and can map to pyparsing expressions without much difficulty. Lastly, pyparsing is not as fast as RE's. But early performance problems can often be improved through some judicious grammar tuning. And for many parsing applications, pyparsing is plenty fast enough. Regards, -- Paul -- http://mail.python.org/mailman/listinfo/python-list