On 2013-08-28, John Levine <jo...@iecc.com> wrote: > I have a crufty old DNS provisioning system that I'm rewriting and I > hope improving in python. (It's based on tinydns if you know what > that is.) > > The record formats are, in the worst case, like this: > > foo.[DOM]::[IP6::4361:6368:6574]:600:: > > What I would like to do is to split this string into a list like this: > > [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ] > > Colons are separators except when they're inside square > brackets. I have been messing around with re.split() and > re.findall() and haven't been able to come up with either a > working separator pattern for split() or a working field > pattern for findall(). I came pretty close with findall() but > can't get it to reliably match the nothing between two adjacent > colons not inside brackets. > > Any suggestions? I realize I could do it in a loop where I pick > stuff off the front of the string, but yuck.
A little parser, as Skip suggested, is a good way to go. The brackets make your string context-sensitive, a difficult concept to cleanly parse with a regex. I initially hoped a csv module dialect could work, but the quote character is (currently) hard-coded to be a single, simple character, i.e., I can't tell it to treat [xxx] as "xxx". What about Skip's suggestion? A little parser. It might seem crass or something, but it really is easier than musceling a regex into a context sensitive grammer. def dns_split(s): in_brackets = False b = 0 # index of beginning of current string for i, c in enumerate(s): if not in_brackets: if c == "[": in_brackets = True elif c == ':': yield s[b:i] b = i+1 elif c == "]": in_brackets = False >>> print(list(dns_split(s))) ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', ''] It'll gag on nested brackets (fixable with a counter) and has no error handling (requires thought), but it's a start. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list