Re: Trouble splitting strings with consecutive delimiters

2012-05-02 Thread rusi
On May 1, 9:50 am, deuteros deute...@xrs.net wrote:
 I'm using regular expressions to split a string using multiple delimiters.
 But if two or more of my delimiters occur next to each other in the
 string, it puts an empty string in the resulting list. For example:

         re.split(':|;|px', width:150px;height:50px;float:right)

 Results in

         ['width', '150', '', 'height', '50', '', 'float', 'right']

 Is there any way to avoid getting '' in my list without adding px; as a
 delimiter?

Are you parsing css?
If so have you tried things like cssutils http://cthedot.de/cssutils/?
[There are other such... And I dont know which is best...]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trouble splitting strings with consecutive delimiters

2012-05-01 Thread Jussi Piitulainen
deuteros writes:

 I'm using regular expressions to split a string using multiple
 delimiters.  But if two or more of my delimiters occur next to each
 other in the string, it puts an empty string in the resulting
 list. For example:
 
   re.split(':|;|px', width:150px;height:50px;float:right)
 
 Results in
 
   ['width', '150', '', 'height', '50', '', 'float', 'right']
 
 Is there any way to avoid getting '' in my list without adding px;
 as a delimiter?

You could use a sequence of such delimiters.

 re.split('(?::|;|px)+', width:150px;height:50px;float:right)
['width', '150', 'height', '50', 'float', 'right']

Consider splitting twice instead: first into key-value substrings at
semicolons, and those into key-value pairs at colons. Here as a dict.
Better handle the units after that.

 dict(kv.split(':') for kv in 
 width:150px;height:50px;float:right.split(';'))
{'width': '150px', 'float': 'right', 'height': '50px'}

You might also want to accept whitespace as part of the delimiters.

(There might be a parser for such data formats somewhere in the
library already. CSV?)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trouble splitting strings with consecutive delimiters

2012-05-01 Thread Peter Otten
deuteros wrote:

 I'm using regular expressions to split a string using multiple delimiters.
 But if two or more of my delimiters occur next to each other in the
 string, it puts an empty string in the resulting list. For example:
 
 re.split(':|;|px', width:150px;height:50px;float:right)
 
 Results in
 
 ['width', '150', '', 'height', '50', '', 'float', 'right']
 
 Is there any way to avoid getting '' in my list without adding px; as a
 delimiter?

That looks like a CSS style; to parse it you should use a tool that was 
built for the job. The first one I came across (because it is included in 
the linux distro I'm using and has css in its name, so this is not an 
endorsement) is

http://packages.python.org/cssutils/

 import cssutils
 style = cssutils.parseStyle(width:150px;height:50px;float:right)
 for property in style.getProperties():
... print property.name, --, property.value
... 
width -- 150px
height -- 50px
float -- right

OK, so you still need to strip off the unit prefix manually:

 def strip_suffix(s, *suffixes):
... for suffix in suffixes:
... if s.endswith(suffix):
... return s[:-len(suffix)]
... return s
... 
 strip_suffix(style.float, pt, px)
u'right'
 strip_suffix(style.width, pt, px)
u'150'


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trouble splitting strings with consecutive delimiters

2012-05-01 Thread Steven D'Aprano
On Tue, 01 May 2012 04:50:48 +, deuteros wrote:

 I'm using regular expressions to split a string using multiple
 delimiters. But if two or more of my delimiters occur next to each other
 in the string, it puts an empty string in the resulting list.

As I would expect. After all, there *is* an empty string between two 
delimiters.


 For example:
 
   re.split(':|;|px', width:150px;height:50px;float:right)
 
 Results in
 
   ['width', '150', '', 'height', '50', '', 'float', 'right']
 
 Is there any way to avoid getting '' in my list without adding px; as a
 delimiter?

Probably. But why not do it the easy way?


items = re.split(':|;|px', width:150px;height:50px;float:right)
items = filter(None, item)

In Python 3, the second line will need to be list(filter(None, item)).



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trouble splitting strings with consecutive delimiters

2012-05-01 Thread Emile van Sebille

re.split(':|;|px', width:150px;height:50px;float:right)


You could recognize that the delimiter you want to strip is in fact px; 
and not px in and of itself.


So, try:

re.split(':|px;', width:150px;height:50px;float:right)

Emile




--
http://mail.python.org/mailman/listinfo/python-list


Re: Trouble splitting strings with consecutive delimiters

2012-05-01 Thread Temia Eszteri
 re.split(':|;|px', width:150px;height:50px;float:right)

You could recognize that the delimiter you want to strip is in fact px; 
and not px in and of itself.

So, try:

re.split(':|px;', width:150px;height:50px;float:right)

Emile

That won't work at all outside of the example case. It'd choke on any
attribute seperator that didn't end in px.

Honestly I'd recommend recovering the size measurement anyway, since
there are pretty huge differences between each form of measurement in
CSS. Seperating it from the number itself is fine and all since you
probably still need to turn it into a number Python can use, but I
wouldn't discard it outright.

~Temia
--
When on earth, do as the earthlings do.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trouble splitting strings with consecutive delimiters

2012-05-01 Thread Emile van Sebille

On 5/1/2012 10:13 AM Temia Eszteri said...

re.split(':|px;', width:150px;height:50px;float:right)

Emile


That won't work at all outside of the example case. It'd choke on any
attribute seperator that didn't end in px.


It would certainly choke on all delimeters that are not presented in the 
argument.  You're free to flavor to taste...


Emile


--
http://mail.python.org/mailman/listinfo/python-list