On 12/10/2007, Keith Bates <[EMAIL PROTECTED]> wrote:
>
> On Fri, 12 Oct 2007 01:14:08 +0100
> Harold Fuchs <[EMAIL PROTECTED]> wrote:
>
> > On 11/10/2007 12:33, Keith Bates wrote:
> > > Hello,
> > >
> > > Someone has sent me one of those evil files in the Mcrosoft 2007
> > > docx format.
> > >
> > >
> > > Failing that, can someone tell me how to use regular expressions to
> > > remove the formatting commands in the basic xml file i.e.
> > > <I want to remove anything between these brackets>
> > >
> > > No matter how long I look at the help file my brain just does not
> > > get reg ex !
> > >
> > > Thanks
> > >
> > >
> > If this would work for you the solution is simple *providing* you can
> > open the xml file in a text editor (vi ???) that supports regular
> > expressions.
> >
> > Change the regular expression "<.*>" (that's less-than, dot,
> > asterisk, greater-than, without the quotes) to nothing (a null
> > string). The less-than and greater-than characters are interpreted
> > literally; the dot star is interpreted as "any number, including
> > zero, of any character".
> >
> > However, I'm not sure if this is a good idea as it will lose *all*
> > the formatting from the document. If that's OK with you then ...
> >
>
> Thanks for that Harold.
>
> Unfortunately, doing this not only lost the formatting but 90% of the
> document as well!


Keith, I think I goofed. If you use an editor that *properly* supports
regular expressions then what I should have told you to change was "<^[>]*>"
without the quotes. That's less-than, circumflex, open-square-bracket,
greater-than, close-square-bracket, asterisk, greater-than. I forgot that in
a proper implementation the asterisk is "greedy" which is why (I think) you
lost your text. Even with this you'll lose all the formatting.



-- 
Harold Fuchs
London, England
Please reply *only* to [email protected]

Reply via email to