* Thus wrote Kevin Stone ([EMAIL PROTECTED]):
> 
> "Paul Van Schayck" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
> > [EMAIL PROTECTED] (Kevin Stone) wrote
> > Hello Kevin.
> > 
> > > This is just a thought.. I have never employed this method
> > > personally.. but I suppose you could read the page into a string and
> > > use the md5() function to generate a hexidecimal value based on that
> > > string.  Store the hex value in a database and compare it against the
> > > new value generated the next day.  If anything on the page has been
> > > modified the values should not match.  Even the most minute change
> > > should trigger a new value.  Obviously you won't know *what* has been
> > > modified only that the page *has* been modified 
> > >
  [...]
> > 
> > Too much pittfals and too slow! Really the socket connection only 
> > retreiving the headers is really the best way. 
> > 
> > This function is what you need:
> > 
> > function fileStamp($domain, $file)
> > {
  [...]
> > return strtotime($time);
> > } 
> > 
> 
> Slow?  Hogwash.  You're pining over microseconds.  Besides most of the time is taken 
> opening the file which you're doing anyway.  Except that the socket method relies on 
> header information that may or may not be there.  I agree it would be ideal if you 
> could use that information but your fileStamp() function isn't going to work for all 
> files on all servers.
 
Ok. no need to argue here.  Both methods arn't correct or the most
efficient.  If you want to keep a copy of the file you use the GET
method otherwise for just checking modification state use the HEAD
method.

First time getting a page, there are some headers you want to pay
attention to: 
  ETag:
  Last-modified:
  [cache directives]
  Expires:
  Content-Length:

And keep these values stored somewhere.

Then when checking to see if the document is changed or should be
re-requested:

  if the document has expired the document should be re-requested
  to see if it has changed, otherwise you are safe to assume that
  it is the same.  Do note that when calculating this, the expired
  time is the server time, so you should keep note (when 
  retrieving the information) the time difference for the
  calculation.

  if you dont have a last-modified or etag, the document *should* 
  be considered modified!

  (observe cache directives)

  if the document has a query string  you must check if it has been
  modified.

To get the document:
  (HEAD|GET) $url_value HTTP/1.1

  host: $host
  ...other misc headers
  
  If you have an ETag, add a request header
    If-Match: $etag_value

  if you have last modified add request header
    If-Modified-Since: $last_modified_value

  if the content-length is available send request header:
    Content-Length: $content_length_value

The response:

  HTTP/1.1 304 Document not modified
    -woot.. it isn't modified.
    
  HTTP/1.1 200 OK
    - it should be considered modified

  [other responses could be returned]

If a GET was requested the document will follow the headers.  And
well, thats all there is to it. Assignment is due next week :)
  

Reference: [1] http://www.w3.org/Protocols/rfc2616/rfc2616.html

HTH.

Curt
-- 
"My PHP key is worn out"

  PHP List stats since 1997: 
          http://zirzow.dyndns.org/html/mlists/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to