Hello everyone,

I'm going nuts with some regex, could someone please show me what I'm doing wrong?

I have an XMPP msg :

<message xmlns='jabber:client' to='n...@host.com'>
   <mynode xmlns='myprotocol:core' version='1.0' type='mytype'>
       <parameters>
           <param1>123</param1>
           <param2>456</param2>
       </parameters>
       <payload type='plain'>...</payload>
   </mynode>
   <x xmlns='jabber:x:expire' seconds='15'/>
</message>

the <parameter> node may be absent or empty (<parameter/>), the <x> node may be absent. I'd like to grab everything exept the <payload> nod and create something new using regex, with the XMPP message example above I'd get this :

<message xmlns='jabber:client' to='n...@host.com'>
   <mynode xmlns='myprotocol:core' version='1.0' type='mytype'>
       <parameters>
           <param1>123</param1>
           <param2>456</param2>
       </parameters>
   </mynode>
   <x xmlns='jabber:x:expire' seconds='15'/>
</message>

for some reason my regex doesn't work correctly :

r"(<message .*?>).*?(<mynode .*?>).*?(?:(<parameters>.*?</parameters>)|<parameters/>)?.*?(<x .*/>)?"

I group the opening <message> node, the opening <mynode> node and if the <parameters> node is present and not empty I group it and if the <x> node is present I group it. For some reason this doesn't work correctly :

>>> import re
>>> s1 = "<message xmlns='jabber:client' to='n...@host.com'><mynode xmlns='myprotocol:core' version='1.0' type='mytype'><parameters><param1>123</param1><param2>456</param2></parameters><payload type='plain'>...</payload></mynode><x xmlns='jabber:x:expire' seconds='15'/></message>" >>> s2 = "<message xmlns='jabber:client' to='n...@host.com'><mynode xmlns='myprotocol:core' version='1.0' type='mytype'><parameters/><payload type='plain'>...</payload></mynode><x xmlns='jabber:x:expire' seconds='15'/></message>" >>> s3 = "<message xmlns='jabber:client' to='n...@host.com'><mynode xmlns='myprotocol:core' version='1.0' type='mytype'><payload type='plain'>...</payload></mynode><x xmlns='jabber:x:expire' seconds='15'/></message>" >>> s4 = "<message xmlns='jabber:client' to='n...@host.com'><mynode xmlns='myprotocol:core' version='1.0' type='mytype'><parameters><param1>123</param1><param2>456</param2></parameters><payload type='plain'>...</payload></mynode></message>" >>> s5 = "<message xmlns='jabber:client' to='n...@host.com'><mynode xmlns='myprotocol:core' version='1.0' type='mytype'><parameters/><payload type='plain'>...</payload></mynode></message>" >>> s6 = "<message xmlns='jabber:client' to='n...@host.com'><mynode xmlns='myprotocol:core' version='1.0' type='mytype'><payload type='plain'>...</payload></mynode></message>" >>> exp = r"(<message .*?>).*?(<mynode .*?>).*?(?:(<parameters>.*?</parameters>)|<parameters/>)?.*?(<x .*/>)?"
>>>
>>> re.match(exp, s1).groups()
("<message xmlns='jabber:client' to='n...@host.com'>", "<mynode xmlns='myprotocol:core' version='1.0' type='mytype'>", '<parameters><param1>123</param1><param2>456</param2></parameters>', None)
>>>
>>> re.match(exp, s2).groups()
("<message xmlns='jabber:client' to='n...@host.com'>", "<mynode xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>
>>> re.match(exp, s3).groups()
("<message xmlns='jabber:client' to='n...@host.com'>", "<mynode xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>
>>> re.match(exp, s4).groups()
("<message xmlns='jabber:client' to='n...@host.com'>", "<mynode xmlns='myprotocol:core' version='1.0' type='mytype'>", '<parameters><param1>123</param1><param2>456</param2></parameters>', None)
>>>
>>> re.match(exp, s5).groups()
("<message xmlns='jabber:client' to='n...@host.com'>", "<mynode xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>
>>> re.match(exp, s6).groups()
("<message xmlns='jabber:client' to='n...@host.com'>", "<mynode xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>


Does someone know what is wrong with my expression? Thank you, Gabriel
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to