Thanks, I will try to adapt this to my needs.
- Vic
-Original Message-
From: JJ Harrison [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, August 14, 2002 5:44 PM
To: [EMAIL PROTECTED]
Subject: [PHP] Re: html parsing from html file through php
> Hello, I am making an app that read from an html file outputted by MS
> word (ya its for those people that need to make webpages but don't
know
> how o write html) anyway, using MS word is a requirement; After the
user
> saves their .doc file as a web page (now and htm file) the php will
take
> that html file from a dir on the server, open it, read it, and ignore
> anything that is from the beginning of the file up to and right after
> the body tag ends, then it must ignore anything at the end of the page
> up and including the body tags and the closing html tag. So basically
> after its done doing its thing I would have all the content of the
page
> ready to be echoed inside another page that would be a sort of shell
or
> template.
>
> I am loocking right now at regular expressions and file_open etc, but
> just to give you an idea and to see if anybody has any helpful
pointers,
> this (yes, can u believe it?) is the beginning of the word2html
> translation that MS word does: (BAH!) (i have to get rid of this
> remember?)
Here is an example regular expression that someone on this group gave
me. It
gives everything between the body tags.
Untitled
Blah Blah Blah Blah
';
preg_match("/(.*)<\/body>/i",$html_text,$matches);
echo $html_text;
?>
Here is a class that removes un-needed word 2000 HTML tags:
http://www.phpclasses.org/browse.html/package/277.html
If you need the styling you will need to do an extra regular expression
to
get out of the head and perhaps put it into a file.
If you don't need styling I would recomment parsing the document itself
and
removing all the class="" and style="" attributes
--
JJ Harrison
[EMAIL PROTECTED]
www.tececo.com
--
Please reply on the list/newsgroup unless the reply it OT.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
__
Post your ad for free now! http://personals.yahoo.ca
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php