>> preg_match('/(<!DOCTYPE.+<\/html>)/ms', $http_response, $html)
>>
>> ...puts it into $html[1]. Adjust to suit your local standards-compliance
>> practices. You can also look for everything after the first instance of
>> "\n\n".
>
>Thats assuming that the person used the <doctype><html> for the first
>item. what bout extra space, SSI, etc...

SSI's wouldn't be output on port 80, and anything before the doctype is
essentially junk anyway. Like I said, suit to taste or just pull from
after the first "\n\n" -- no regexp needed in that case.

Personally, I'm using the regexp method, because I'm using the above code
in a context where the server is flakey. I need to repeat the request
immediately if PHP craps out and dies before the '</html>' is output. It's
a proxy that allows for the possibility of segfaults and premature exits
and such.

looks like this if you are curious. the usleep give the server increasing
amount of chill-out time as the number of attempts is incremented:
---------------------------------------------------------------------
do {
    usleep(pow(4, $attempt));

    if($fp = @fsockopen($GLOBALS['_SERVER']['HTTP_HOST'], 80, $errno, $errstr, 30)) {
        fputs($fp, sprintf("GET %s?fresh=fresh HTTP/1.0\nHost: %s\nConnection: 
close\n\n",
                           str_replace('proxy.php', 'view.php', 
$GLOBALS['_SERVER']['REQUEST_URI']),
                           $GLOBALS['_SERVER']['HTTP_HOST']));

        $html = '';
        while(!feof($fp)) $html .= fgets($fp, 128);
        fclose($fp);

    }

} while((++$attempt <= $max_tries) && !preg_match('/(<!DOCTYPE.+<\/html>)/ms', $html, 
$matches));
---------------------------------------------------------------------
michal migurski- contact info and pgp key:
sf/ca            http://mike.teczno.com/contact.html


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to