You could use fopen() to retreive the HTML of the page, then parse it
taking your "best guess" at a 404 message. Not very robust though.
It'd be much better to open up a socket to the site itself, then issue a
"HEAD /" request. If the page is not found you'll have a line
in the return result (should be the first line) which says "HTTP/1.0 404
Not Found", then your Content-type: will come across and the actual HTML
for the 404 page.
You'll have to parse the domain name off the entire URL that's stored in
your database which could be done by chopping off a leading "http://"; if
any is provided, then split at the first / mark. Use the experimental
socket() function (http://www.php.net/manual/en/function.socket.php) to
make your connection and issue the HEAD request.
Hope that helps...
Justin Buist
Trident Technology, Inc.
4700 60th St. SW, Suite 102
Grand Rapids, MI 49512
Ph. 616.554.2700
Fx. 616.554.3331
Mo. 616.291.2612
On Sat, 8 Sep 2001, Larry "RedCobra" Linthicum wrote:
> I have a database with lots of urls stored
>
> I would like to build a script the retrieves them ... then somehow checks
> each one to see if the link produces a "error 404" and is no longer good,
> then ideally write a file of "bad links" for me
>
> this seems like it would be possible with PHP, could someone give me some
> hints on where to start? I know very little about http headers and such
>
> any information will be appreciated
>
>
>
> --
> PHP Database Mailing List (http://www.php.net/)
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> To contact the list administrators, e-mail: [EMAIL PROTECTED]
>
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]