>> preg_match('/(<!DOCTYPE.+<\/html>)/ms', $http_response, $html) >> >> ...puts it into $html[1]. Adjust to suit your local standards-compliance >> practices. You can also look for everything after the first instance of >> "\n\n". > >Thats assuming that the person used the <doctype><html> for the first >item. what bout extra space, SSI, etc...
SSI's wouldn't be output on port 80, and anything before the doctype is essentially junk anyway. Like I said, suit to taste or just pull from after the first "\n\n" -- no regexp needed in that case. Personally, I'm using the regexp method, because I'm using the above code in a context where the server is flakey. I need to repeat the request immediately if PHP craps out and dies before the '</html>' is output. It's a proxy that allows for the possibility of segfaults and premature exits and such. looks like this if you are curious. the usleep give the server increasing amount of chill-out time as the number of attempts is incremented: --------------------------------------------------------------------- do { usleep(pow(4, $attempt)); if($fp = @fsockopen($GLOBALS['_SERVER']['HTTP_HOST'], 80, $errno, $errstr, 30)) { fputs($fp, sprintf("GET %s?fresh=fresh HTTP/1.0\nHost: %s\nConnection: close\n\n", str_replace('proxy.php', 'view.php', $GLOBALS['_SERVER']['REQUEST_URI']), $GLOBALS['_SERVER']['HTTP_HOST'])); $html = ''; while(!feof($fp)) $html .= fgets($fp, 128); fclose($fp); } } while((++$attempt <= $max_tries) && !preg_match('/(<!DOCTYPE.+<\/html>)/ms', $html, $matches)); --------------------------------------------------------------------- michal migurski- contact info and pgp key: sf/ca http://mike.teczno.com/contact.html -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php