Hello everybody.
Probably the title of this post is not very clear, sorry for that ;).
I have a bunch of text (html code) and need to find <p> tags with their
classes, id, styles (if any) etc. I'm doing this using the following regexs:
<p(.*?)> or (<p([^>]+))>
The pattern of my text is here:
<p class="navi_buttons">Lorem ipsum dolor sit amet, consectetur
adipiscing elit.</p>
<p class="reg">Aliquam mi sapien, rutrum eget sem vel, semper
efficitur.<a href="xyz.html" class="topiclink">vitae velit</a></p>
<p class="THIS_SHOULD_BE_AVOIDED">Donec fringilla sapien vitae interdum
volutpat.</p>
<p class="nav">Cras nec orci non dolor ultrices luctus sit amet vitae
velit.</p>
The problem is that I need to find every occurrence of <p> tag except
one certain class (i.e. I want to avoid paragraph tags of this class). I
don't know how to write a regex exclusion that is treated as a string,
not a set of the individual characters? I tried to use back-references,
with no success. I want to use regex because the tag classes, to be
avoided, are different on each page (but they keep a certain pattern)
and a the job should be done as automatic as possible (the code should
be as versatile as possible).
I will appreciate any help. Kind regards,
gordom
--
To unsubscribe e-mail to: [email protected]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted