> Hello, I am making an app that read from an html file outputted by MS
> word (ya its for those people that need to make webpages but don't know
> how o write html) anyway, using MS word is a requirement; After the user
> saves their .doc file as a web page (now and htm file) the php will take
> that html file from a dir on the server, open it, read it, and ignore
> anything that is from the beginning of the file up to and right after
> the body tag ends, then it must ignore anything at the end of the page
> up and including the body tags and the closing html tag. So basically
> after its done doing its thing I would have all the content of the page
> ready to be echoed inside another page that would be a sort of shell or
> template.
> I am loocking right now at regular expressions and file_open etc, but
> just to give you an idea and to see if anybody has any helpful pointers,
> this (yes, can u believe it?) is the beginning of the word2html
> translation that MS word does: (BAH!) (i have to get rid of this
> remember?)

Here is an example regular expression that someone on this group gave me. It
gives everything between the body tags.
$html_text = '
Blah Blah Blah Blah
echo $html_text;

Here is a class that removes un-needed word 2000 HTML tags:

If you need the styling you will need to do an extra regular expression to
get out of the head and perhaps put it into a file.
If you don't need styling I would recomment parsing the document itself and
removing all the class="" and style="" attributes

JJ Harrison

