On 16 July 2013 20:48, <python-list-requ...@python.org> wrote: > From: "Anders J. Munch" <2...@jmunch.dk> > Date: Tue, 16 Jul 2013 13:38:35 +0200 > Ben Last wrote: > >> north_american_number_re = (RE().start >> .literal('(').followed_by.**exactly(3).digits.then.**literal(')') >> .then.one.literal("-").then.** >> exactly(3).digits >> .then.one.dash.followed_by.**exactly(4).digits.then.end >> .as_string()) >> > > Very cool. It's a bit verbose for my taste, and I'm not sure how well it > will cope with nested structure. >
I guess verbosity is the aim, in that *explicit is better than implicit* :) And I suppose that's one of the attributes of a fluent system; they tend to need more typing. It's not Perl... > The problem with Perl-style regexp notation isn't so much that it's terse > - it's that the syntax is irregular (sic) and doesn't follow modern > principles for lexical structure in computer languages. You can get a long > way just by ignoring whitespace, putting literals in quotes and allowing > embedded comments. > Good points. I wanted to find a syntax that allows comments as well as being fluent: RE() .any_number_of.digits # Recall that any_number_of includes zero .followed_by.an_optional.dot.then.at_least_one.digit # The dot is specifically optional # but we must have one digit as a minimum .as_string() ... and yes, I aso specifically wanted to have literals quoted. Nested groups work, but I haven't tackled lookahead and backreferences : essentially because if you're writing an RE that complex, you should probably be working directly in RE strings. Depending on what you mean by "nested", re-use of RE objects is easy (example from the unit tests): identifier_start_chars = RE().regex("[a-zA-Z_]") identifier_chars = RE().regex("[a-zA-Z0-9_]") self.assertEqual(RE().one_or_more.of(identifier_start_chars) .followed_by.zero_or_more(identifier_chars) .as_string(), r"[a-zA-Z_]+[a-zA-Z0-9_]*") Thanks for the comments! ben
-- http://mail.python.org/mailman/listinfo/python-list