On Mar 21, 2005, at 18:45, Enrique wrote:
> Manolis Christodoulou wrote:
>
>> Enrique wrote:
>>> Hi all,
>>> I have realized that in OOo the regular expression search >>> consistently returns the longest possible match within the >>> paragraph.
>>>
>>> I am reformatting an XHTML text (and want to do within OOo, no XML >>> processor), for instance:
>>>
>>> <p class="Standard">Escriba aquν Espaρa, Ca<span >>> class="T1">2+</span> con <span class="T2">acentos</span> <span >>> class="T3">Camiσn</span>, y con griegas alfa:<span >>> class="T4">α</span>, beta: <span class="T4">β</span>, y >>> DELTA: <span class="T4">Δ</span> {T}</p>
>>>
>>> But I need to locate the shortest match. An example "<span.*</span>" >>> selects almost the entire paragraph above, while I need just each >>> tag, one after the other.
>> try "<span.*?</span>"
>> ? is the lazy match mark. It works in Perl anyway...
> Already tried, but without luck!
> Thanks anyway!
>
Enrique,
These RegExp's in OOo are greedy, and they work only within paragraphs. So you'll have to transform each set of tags into a paragraph.
First, double all the paragraph breaks
Search For: ^$
Replace with: \n\n
(So you can find them again)
Then, break out the tags into individual paragraphs:
Search For: </span>; Replace with: ?\n
This will leave the (T) </p> on its own line.
Do whatever it is you want to do with the tag sets, then reverse the process:
Search for: ^$
Replace with ****
Search for: $
Replace with: (nothing--delete them)
Search for: ****
Replace with: \n
Note: I haven't tested this, so you may have to tweak it.
