Re: [PHP] How can I strip the code from HTML pages to extract thecontents of a HTML page.

2002-08-29 Thread Charles Fowler

I was looking into stripping HTML files that contain alot of links. I
was trying to avoid the manual way of data entry. The contents i need
are the name of the link (plain text which sits out side the HTML code)
and all the a href tags. I would like the a href  (ie.the hyperlink)
tags to be displayed on the HTML output as plain text. All other HTML
tags would be kept in place.

The reason why I am doing this is that I am placing a link's name and
the http:// link in to flat files, where they can be updated just by
appending to them. The srcipt that I have does the rest.

I have looked into the functions suggested but do find the concepts and
use of the opperators to strip the HTML involved esoteric and tricky.

¬¬Chuck¬¬



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: [PHP] How can I strip the code from HTML pages to extract thecontents of a HTML page.

2002-08-29 Thread Todd Pasley


Hi Charles,

Not sure exactly what you are after, but

function displayLinks ($pagecontents) {
$search = '/a (.*?)href=(.*?)(.*?)\/a/im';
$replace = 'a
$1href=recordstep.php?clientid=$clientidtestid=$testidlink=$2$3/a';
return (preg_replace ($search, $replace, $pagecontents));
}

For me, that takes all the links in $pagecontents and modifies the links for
a recorder I am building. You could do something simular with it, although
if you need the name, you might want...

$search '/a (.*?)href=(.*?)(.*?)(.*?)\/a/im';
And $4 would be your name.

I hope this helps.

Todd.


- Original Message -
From: Charles Fowler [EMAIL PROTECTED]
To: Justin French [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Friday, August 30, 2002 10:14 AM
Subject: Re: [PHP] How can I strip the code from HTML pages to extract
thecontents of a HTML page.


 I was looking into stripping HTML files that contain alot of links. I
 was trying to avoid the manual way of data entry. The contents i need
 are the name of the link (plain text which sits out side the HTML code)
 and all the a href tags. I would like the a href  (ie.the hyperlink)
 tags to be displayed on the HTML output as plain text. All other HTML
 tags would be kept in place.

 The reason why I am doing this is that I am placing a link's name and
 the http:// link in to flat files, where they can be updated just by
 appending to them. The srcipt that I have does the rest.

 I have looked into the functions suggested but do find the concepts and
 use of the opperators to strip the HTML involved esoteric and tricky.

 ¬¬Chuck¬¬








 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] How can I strip the code from HTML pages to extract thecontents of a HTML page.

2002-08-28 Thread Justin French

Either the ereg_replace, eregi_replace, or preg_replace has a full working
script that does this, returning pretty much plain text.

There's also the strip_tags()/striptags() function which strips out all PHP
and HTML tags -- perhaps not enough, nice you'd want to remove *some* other
stuff maybe, but it's a good start, and may be used in conjunction with
other stuff.

You haven't said if you want:

- all the stuff between the body tags OR
- all the stuff that isn't tags (would include the title, and perhaps other
stuff


As per usual, specifically asking for what you want helps, but there is
HEAPS of ways of doing this.


More than likely you'll find/build the components you need in different
places:

- recursively run through a directory for each HTML file
- stripping each HTML file
- possibly presenting the raw text in a TEXTAREA for previewing/modifying
- adding the text to the DB, probably assigning the ID based on the original
filename, or something

Etc etc


Good luck,

Justin



on 28/08/02 11:58 PM, Charles Fowler ([EMAIL PROTECTED]) wrote:

 This may be an interesting challenge for someone or has it been done
 before
 
 Can some one help me.
 
 I am looking for a laboursaving method of extracting the contents of a
 web page (Text only) and dumping the rest of the html code.
 
 I need the contents to rework the pages and put the contents into flat
 file database. Large but only two columns of data. Simple to work with
 (no need for DB) - They are just alot of links on a links page.
 
 Scripts would be welcome.
 
 Ciao, Carlos
 
 
 


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php