Re: Trouble splitting strings with consecutive delimiters
On May 1, 9:50 am, deuteros deute...@xrs.net wrote: I'm using regular expressions to split a string using multiple delimiters. But if two or more of my delimiters occur next to each other in the string, it puts an empty string in the resulting list. For example: re.split(':|;|px', width:150px;height:50px;float:right) Results in ['width', '150', '', 'height', '50', '', 'float', 'right'] Is there any way to avoid getting '' in my list without adding px; as a delimiter? Are you parsing css? If so have you tried things like cssutils http://cthedot.de/cssutils/? [There are other such... And I dont know which is best...] -- http://mail.python.org/mailman/listinfo/python-list
Re: Trouble splitting strings with consecutive delimiters
deuteros writes: I'm using regular expressions to split a string using multiple delimiters. But if two or more of my delimiters occur next to each other in the string, it puts an empty string in the resulting list. For example: re.split(':|;|px', width:150px;height:50px;float:right) Results in ['width', '150', '', 'height', '50', '', 'float', 'right'] Is there any way to avoid getting '' in my list without adding px; as a delimiter? You could use a sequence of such delimiters. re.split('(?::|;|px)+', width:150px;height:50px;float:right) ['width', '150', 'height', '50', 'float', 'right'] Consider splitting twice instead: first into key-value substrings at semicolons, and those into key-value pairs at colons. Here as a dict. Better handle the units after that. dict(kv.split(':') for kv in width:150px;height:50px;float:right.split(';')) {'width': '150px', 'float': 'right', 'height': '50px'} You might also want to accept whitespace as part of the delimiters. (There might be a parser for such data formats somewhere in the library already. CSV?) -- http://mail.python.org/mailman/listinfo/python-list
Re: Trouble splitting strings with consecutive delimiters
deuteros wrote: I'm using regular expressions to split a string using multiple delimiters. But if two or more of my delimiters occur next to each other in the string, it puts an empty string in the resulting list. For example: re.split(':|;|px', width:150px;height:50px;float:right) Results in ['width', '150', '', 'height', '50', '', 'float', 'right'] Is there any way to avoid getting '' in my list without adding px; as a delimiter? That looks like a CSS style; to parse it you should use a tool that was built for the job. The first one I came across (because it is included in the linux distro I'm using and has css in its name, so this is not an endorsement) is http://packages.python.org/cssutils/ import cssutils style = cssutils.parseStyle(width:150px;height:50px;float:right) for property in style.getProperties(): ... print property.name, --, property.value ... width -- 150px height -- 50px float -- right OK, so you still need to strip off the unit prefix manually: def strip_suffix(s, *suffixes): ... for suffix in suffixes: ... if s.endswith(suffix): ... return s[:-len(suffix)] ... return s ... strip_suffix(style.float, pt, px) u'right' strip_suffix(style.width, pt, px) u'150' -- http://mail.python.org/mailman/listinfo/python-list
Re: Trouble splitting strings with consecutive delimiters
On Tue, 01 May 2012 04:50:48 +, deuteros wrote: I'm using regular expressions to split a string using multiple delimiters. But if two or more of my delimiters occur next to each other in the string, it puts an empty string in the resulting list. As I would expect. After all, there *is* an empty string between two delimiters. For example: re.split(':|;|px', width:150px;height:50px;float:right) Results in ['width', '150', '', 'height', '50', '', 'float', 'right'] Is there any way to avoid getting '' in my list without adding px; as a delimiter? Probably. But why not do it the easy way? items = re.split(':|;|px', width:150px;height:50px;float:right) items = filter(None, item) In Python 3, the second line will need to be list(filter(None, item)). -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Trouble splitting strings with consecutive delimiters
re.split(':|;|px', width:150px;height:50px;float:right) You could recognize that the delimiter you want to strip is in fact px; and not px in and of itself. So, try: re.split(':|px;', width:150px;height:50px;float:right) Emile -- http://mail.python.org/mailman/listinfo/python-list
Re: Trouble splitting strings with consecutive delimiters
re.split(':|;|px', width:150px;height:50px;float:right) You could recognize that the delimiter you want to strip is in fact px; and not px in and of itself. So, try: re.split(':|px;', width:150px;height:50px;float:right) Emile That won't work at all outside of the example case. It'd choke on any attribute seperator that didn't end in px. Honestly I'd recommend recovering the size measurement anyway, since there are pretty huge differences between each form of measurement in CSS. Seperating it from the number itself is fine and all since you probably still need to turn it into a number Python can use, but I wouldn't discard it outright. ~Temia -- When on earth, do as the earthlings do. -- http://mail.python.org/mailman/listinfo/python-list
Re: Trouble splitting strings with consecutive delimiters
On 5/1/2012 10:13 AM Temia Eszteri said... re.split(':|px;', width:150px;height:50px;float:right) Emile That won't work at all outside of the example case. It'd choke on any attribute seperator that didn't end in px. It would certainly choke on all delimeters that are not presented in the argument. You're free to flavor to taste... Emile -- http://mail.python.org/mailman/listinfo/python-list