>> preg_match('/(<!DOCTYPE.+<\/html>)/ms', $http_response, $html)
>>
>> ...puts it into $html[1]. Adjust to suit your local standards-compliance
>> practices. You can also look for everything after the first instance of
>> "\n\n".
>
>Thats assuming that the person used the <doctype><html> for the first
>item. what bout extra space, SSI, etc...
SSI's wouldn't be output on port 80, and anything before the doctype is
essentially junk anyway. Like I said, suit to taste or just pull from
after the first "\n\n" -- no regexp needed in that case.
Personally, I'm using the regexp method, because I'm using the above code
in a context where the server is flakey. I need to repeat the request
immediately if PHP craps out and dies before the '</html>' is output. It's
a proxy that allows for the possibility of segfaults and premature exits
and such.
looks like this if you are curious. the usleep give the server increasing
amount of chill-out time as the number of attempts is incremented:
---------------------------------------------------------------------
do {
usleep(pow(4, $attempt));
if($fp = @fsockopen($GLOBALS['_SERVER']['HTTP_HOST'], 80, $errno, $errstr, 30)) {
fputs($fp, sprintf("GET %s?fresh=fresh HTTP/1.0\nHost: %s\nConnection:
close\n\n",
str_replace('proxy.php', 'view.php',
$GLOBALS['_SERVER']['REQUEST_URI']),
$GLOBALS['_SERVER']['HTTP_HOST']));
$html = '';
while(!feof($fp)) $html .= fgets($fp, 128);
fclose($fp);
}
} while((++$attempt <= $max_tries) && !preg_match('/(<!DOCTYPE.+<\/html>)/ms', $html,
$matches));
---------------------------------------------------------------------
michal migurski- contact info and pgp key:
sf/ca http://mike.teczno.com/contact.html
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php