Re: simple regular expression problem

Jason Drew Mon, 17 Sep 2007 06:28:40 -0700

You just need a one-character addition to your regex:

regex = re.compile(r'<organisatie.*?</organisatie>', re.S)


Note, there is now a question mark (?) after the .*

By default, regular expressions are "greedy" and will grab as much
text as possible when making a match. So your original expression was
grabbing everything between the first opening tag and the last closing
tag. The question mark says, don't be greedy, and you get the
behaviour you need.

This is covered in the documentation for the re module.
http://docs.python.org/lib/module-re.html

Jason

On Sep 17, 9:00 am, duikboot <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I am trying to extract a list of strings from a text. I am looking it
> for hours now, googling didn't help either.
> Could you please help me?
>
> >>>s = """ 
> >>>\n<organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>\n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie>"""
> >>> regex = re.compile(r'<organisatie.*</organisatie>', re.S)
> >>> L = regex.findall(s)
> >>> print L
>
> ['organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>
> \n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie']
>
> I expected:
> [('organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>
> \n<organisatie>), (<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</
> organisatie')]
>
> I must be missing something very obvious.
>
> Greetings Arjen


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: simple regular expression problem

Reply via email to