Either the ereg_replace, eregi_replace, or preg_replace has a full working
script that does this, returning pretty much plain text.
There's also the strip_tags()/striptags() function which strips out all PHP
and HTML tags -- perhaps not enough, nice you'd want to remove *some* other
stuff maybe, but it's a good start, and may be used in conjunction with
You haven't said if you want:
- all the stuff between the body tags OR
- all the stuff that isn't tags (would include the title, and perhaps other
As per usual, specifically asking for what you want helps, but there is
HEAPS of ways of doing this.
More than likely you'll find/build the components you need in different
- recursively run through a directory for each HTML file
- stripping each HTML file
- possibly presenting the raw text in a TEXTAREA for previewing/modifying
- adding the text to the DB, probably assigning the ID based on the original
filename, or something
on 28/08/02 11:58 PM, Charles Fowler ([EMAIL PROTECTED]) wrote:
> This may be an interesting challenge for someone or has it been done
> Can some one help me.
> I am looking for a laboursaving method of extracting the contents of a
> web page (Text only) and dumping the rest of the html code.
> I need the contents to rework the pages and put the contents into flat
> file database. Large but only two columns of data. Simple to work with
> (no need for DB) - They are just alot of links on a links page.
> Scripts would be welcome.
> Ciao, Carlos
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php