deuteros wrote: > I'm using regular expressions to split a string using multiple delimiters. > But if two or more of my delimiters occur next to each other in the > string, it puts an empty string in the resulting list. For example: > > re.split(':|;|px', "width:150px;height:50px;float:right") > > Results in > > ['width', '150', '', 'height', '50', '', 'float', 'right'] > > Is there any way to avoid getting '' in my list without adding px; as a > delimiter?
That looks like a CSS style; to parse it you should use a tool that was built for the job. The first one I came across (because it is included in the linux distro I'm using and has "css" in its name, so this is not an endorsement) is http://packages.python.org/cssutils/ >>> import cssutils >>> style = cssutils.parseStyle("width:150px;height:50px;float:right") >>> for property in style.getProperties(): ... print property.name, "-->", property.value ... width --> 150px height --> 50px float --> right OK, so you still need to strip off the unit prefix manually: >>> def strip_suffix(s, *suffixes): ... for suffix in suffixes: ... if s.endswith(suffix): ... return s[:-len(suffix)] ... return s ... >>> strip_suffix(style.float, "pt", "px") u'right' >>> strip_suffix(style.width, "pt", "px") u'150' -- http://mail.python.org/mailman/listinfo/python-list