On Thu, 2009-02-19 at 10:55 -0800, Ron Garret wrote: > I'm trying to split a CamelCase string into its constituent components. > This kind of works: > > >>> re.split('[a-z][A-Z]', 'fooBarBaz') > ['fo', 'a', 'az'] > > but it consumes the boundary characters. To fix this I tried using > lookahead and lookbehind patterns instead, but it doesn't work:
That's how re.split works, same as str.split... > >>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz') > ['fooBarBaz'] > > However, it does seem to work with findall: > > >>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz') > ['', ''] Wow! To tell you the truth, I can't even read that... but one wonders why don't you just do def ccsplit(s): cclist = [] current_word = '' for char in s: if char in string.uppercase: if current_word: cclist.append(current_word) current_word = char else: current_word += char if current_word: ccl.append(current_word) return cclist >>> ccsplit('fooBarBaz') --> ['foo', 'Bar', 'Baz'] This is arguably *much* more easy to read than the re example doesn't require one to look ahead in the string. -a -- http://mail.python.org/mailman/listinfo/python-list