Hello everybody.
Probably the title of this post is not very clear, sorry for that ;).

I have a bunch of text (html code) and need to find <p> tags with their classes, id, styles (if any) etc. I'm doing this using the following regexs:
<p(.*?)> or (<p([^>]+))>

The pattern of my text is here:

<p class="navi_buttons">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>

<p class="reg">Aliquam mi sapien, rutrum eget sem vel, semper efficitur.<a href="xyz.html" class="topiclink">vitae velit</a></p>

<p class="THIS_SHOULD_BE_AVOIDED">Donec fringilla sapien vitae interdum volutpat.</p>

<p class="nav">Cras nec orci non dolor ultrices luctus sit amet vitae velit.</p>


The problem is that I need to find every occurrence of <p> tag except one certain class (i.e. I want to avoid paragraph tags of this class). I don't know how to write a regex exclusion that is treated as a string, not a set of the individual characters? I tried to use back-references, with no success. I want to use regex because the tag classes, to be avoided, are different on each page (but they keep a certain pattern) and a the job should be done as automatic as possible (the code should be as versatile as possible).
I will appreciate any help. Kind regards,

gordom

--
To unsubscribe e-mail to: [email protected]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Reply via email to