"Paddy" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Proposal: Named RE variables > ====================== > > The problem I have is that I am writing a 'good-enough' verilog tag > extractor as a long regular expression (with the 'x' flag for > readability), and find myself both > 1) Repeating sections of the RE, and > 2) Wanting to add '(?P<some_clarifier>...) ' around sections > because I know what the section does but don't really want > the group. > > If I could write: > (?P/verilog_name/ [A-Za-z_][A-Za-z_0-9\$\.]* | \\\S+ ) > > ...and have the RE parser extract the section of RE after the second > '/' and store it associated with its name that appears between the > first two '/'. The RE should NOT try and match against anything between > the outer '(' ')' pair at this point, just store. > > Then the following code appearing later in the RE: > (?P=verilog_name) > > ...should retrieve the RE snippet named and insert it into the RE > instead of the '(?P=...)' group before interpreting the RE 'as normal' > > Instead of writing the following to search for event declarations: > vlog_extract = r'''(?smx) > # Verilog event definition extraction > (?: event \s+ [A-Za-z_][A-Za-z_0-9\$\.]* \s* (?: , \s* > [A-Za-z_][A-Za-z_0-9\$\.]*)* ) > ''' > I could write the following RE, which I think is clearer: > vlog_extract = r'''(?smx) > # Verilog identifier definition > (?P/IDENT/ [A-Za-z_][A-Za-z_0-9\$\.]* (?!\.) ) > # Verilog event definition extraction > (?: event \s+ (?P=IDENT) \s* (?: , \s* (?P=IDENT))* ) > ''' >
By contrast, the event declaration expression in the pyparsing Verilog parser is: identLead = alphas+"$_" identBody = alphanums+"$_" #~ identifier = Combine( Optional(".") + #~ delimitedList( Word(identLead, identBody), ".", combine=True ) ).setName("baseIdent") # replace pyparsing composition with Regex - improves performance ~10% for this construct identifier = Regex( r"\.?["+identLead+"]["+identBody+"]*(\.["+identLead+"]["+identBody+"]*)*" ). setName("baseIdent") eventDecl = Group( "event" + delimitedList( identifier ) + semi ) But why do you need an update to RE to compose snippets? Especially snippets that you can only use in the same RE? Just do string interp: > I could write the following RE, which I think is clearer: > vlog_extract = r'''(?smx) > # Verilog identifier definition > (?P/IDENT/ [A-Za-z_][A-Za-z_0-9\$\.]* (?!\.) ) > # Verilog event definition extraction > (?: event \s+ (?P=IDENT) \s* (?: , \s* (?P=IDENT))* ) > ''' IDENT = "[A-Za-z_][A-Za-z_0-9\$\.]* (?!\.)" vlog_extract = r'''(?smx) # Verilog event definition extraction (?: event \s+ %(IDENT)s \s* (?: , \s* %(IDENT)s)* ) ''' % locals() Yuk, this is a mess - which '%' signs are part of RE and which are for string interp? Maybe just plain old string concat is better: IDENT = "[A-Za-z_][A-Za-z_0-9\$\.]* (?!\.)" vlog_extract = r'''(?smx) # Verilog event definition extraction (?: event \s+ ''' + IDENT + ''' \s* (?: , \s* ''' + IDENT + ''')* )''' By the way, your IDENT is not totally accurate - it does not permit a leading ".", and it does permit leading digits in identifier elements after the first ".". So ".goForIt" would not be matched as a valid identifier when it should, and "go.4it" would be matched as valid when it shouldn't (at least as far as I read the Verilog grammar). (Pyparsing (http://sourceforge.net/projects/pyparsing/) is open source under the MIT license. The Verilog grammar is not distributed with pyparsing, and is only available free for noncommercial use.) -- Paul -- http://mail.python.org/mailman/listinfo/python-list