On 12/10/2007, Keith Bates <[EMAIL PROTECTED]> wrote: > > On Fri, 12 Oct 2007 01:14:08 +0100 > Harold Fuchs <[EMAIL PROTECTED]> wrote: > > > On 11/10/2007 12:33, Keith Bates wrote: > > > Hello, > > > > > > Someone has sent me one of those evil files in the Mcrosoft 2007 > > > docx format. > > > > > > > > > Failing that, can someone tell me how to use regular expressions to > > > remove the formatting commands in the basic xml file i.e. > > > <I want to remove anything between these brackets> > > > > > > No matter how long I look at the help file my brain just does not > > > get reg ex ! > > > > > > Thanks > > > > > > > > If this would work for you the solution is simple *providing* you can > > open the xml file in a text editor (vi ???) that supports regular > > expressions. > > > > Change the regular expression "<.*>" (that's less-than, dot, > > asterisk, greater-than, without the quotes) to nothing (a null > > string). The less-than and greater-than characters are interpreted > > literally; the dot star is interpreted as "any number, including > > zero, of any character". > > > > However, I'm not sure if this is a good idea as it will lose *all* > > the formatting from the document. If that's OK with you then ... > > > > Thanks for that Harold. > > Unfortunately, doing this not only lost the formatting but 90% of the > document as well!
Keith, I think I goofed. If you use an editor that *properly* supports regular expressions then what I should have told you to change was "<^[>]*>" without the quotes. That's less-than, circumflex, open-square-bracket, greater-than, close-square-bracket, asterisk, greater-than. I forgot that in a proper implementation the asterisk is "greedy" which is why (I think) you lost your text. Even with this you'll lose all the formatting. -- Harold Fuchs London, England Please reply *only* to [email protected]
