RE: [PHP] Re: html parsing from html file through php

2002-08-14 Thread vic

Thanks, I will try to adapt this to my needs.

- Vic


-Original Message-
From: JJ Harrison [mailto:[EMAIL PROTECTED]] 
Sent: Wednesday, August 14, 2002 5:44 PM
To: [EMAIL PROTECTED]
Subject: [PHP] Re: html parsing from html file through php


> Hello, I am making an app that read from an html file outputted by MS
> word (ya its for those people that need to make webpages but don't
know
> how o write html) anyway, using MS word is a requirement; After the
user
> saves their .doc file as a web page (now and htm file) the php will
take
> that html file from a dir on the server, open it, read it, and ignore
> anything that is from the beginning of the file up to and right after
> the body tag ends, then it must ignore anything at the end of the page
> up and including the body tags and the closing html tag. So basically
> after its done doing its thing I would have all the content of the
page
> ready to be echoed inside another page that would be a sort of shell
or
> template.
>
> I am loocking right now at regular expressions and file_open etc, but
> just to give you an idea and to see if anybody has any helpful
pointers,
> this (yes, can u believe it?) is the beginning of the word2html
> translation that MS word does: (BAH!) (i have to get rid of this
> remember?)


Here is an example regular expression that someone on this group gave
me. It
gives everything between the body tags.


Untitled


Blah Blah Blah Blah


';
preg_match("/(.*)<\/body>/i",$html_text,$matches);
echo $html_text;
?>

Here is a class that removes un-needed word 2000 HTML tags:
http://www.phpclasses.org/browse.html/package/277.html

If you need the styling you will need to do an extra regular expression
to
get out of the head and perhaps put it into a file.
If you don't need styling I would recomment parsing the document itself
and
removing all the class="" and style="" attributes


--
JJ Harrison
[EMAIL PROTECTED]
www.tececo.com

--
Please reply on the list/newsgroup unless the reply it OT.



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

__ 
Post your ad for free now! http://personals.yahoo.ca

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




[PHP] Re: html parsing from html file through php

2002-08-14 Thread JJ Harrison


> Hello, I am making an app that read from an html file outputted by MS
> word (ya its for those people that need to make webpages but don't know
> how o write html) anyway, using MS word is a requirement; After the user
> saves their .doc file as a web page (now and htm file) the php will take
> that html file from a dir on the server, open it, read it, and ignore
> anything that is from the beginning of the file up to and right after
> the body tag ends, then it must ignore anything at the end of the page
> up and including the body tags and the closing html tag. So basically
> after its done doing its thing I would have all the content of the page
> ready to be echoed inside another page that would be a sort of shell or
> template.
>
> I am loocking right now at regular expressions and file_open etc, but
> just to give you an idea and to see if anybody has any helpful pointers,
> this (yes, can u believe it?) is the beginning of the word2html
> translation that MS word does: (BAH!) (i have to get rid of this
> remember?)


Here is an example regular expression that someone on this group gave me. It
gives everything between the body tags.


Untitled


Blah Blah Blah Blah


';
preg_match("/(.*)<\/body>/i",$html_text,$matches);
echo $html_text;
?>

Here is a class that removes un-needed word 2000 HTML tags:
http://www.phpclasses.org/browse.html/package/277.html

If you need the styling you will need to do an extra regular expression to
get out of the head and perhaps put it into a file.
If you don't need styling I would recomment parsing the document itself and
removing all the class="" and style="" attributes


--
JJ Harrison
[EMAIL PROTECTED]
www.tececo.com

--
Please reply on the list/newsgroup unless the reply it OT.



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php