RE: [PHP] Finding out when a Web page has changed

2002-09-26 Thread John Holmes

You could cache/save the actual contents of the file, then when you read
it next time, compare it to what you saved and see if it changed. You
may want to filter out everything but what's between body and /body,
so you're not thinking it changed just b/c of something in the
headers...

---John Holmes...

 -Original Message-
 From: Vikram Vaswani [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, September 26, 2002 7:04 AM
 To: [EMAIL PROTECTED]
 Subject: [PHP] Finding out when a Web page has changed
 
 Hi all,
 
 I need to write an application that accepts a list of URLs and checks
them
 on a daily basis (via cron) to see if the pages have changed in the
past
 day.
 
 I need some help with this. Does anyone know the most optimal way to
find
 out when a particular Web page has been modified? I am thinking about
 using
 the Last-Modified: HTTP header - however, all servers do not return
this
 header - any ideas on what the fallback should be?
 
 TIA,
 
 Vikram
 --
 I find your lack of faith disturbing.
   --Darth Vader
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Finding out when a Web page has changed

2002-09-26 Thread Marek Kilimajer

Hope the sites have no banners :),  they change all the time

John Holmes wrote:

You could cache/save the actual contents of the file, then when you read
it next time, compare it to what you saved and see if it changed. You
may want to filter out everything but what's between body and /body,
so you're not thinking it changed just b/c of something in the
headers...

---John Holmes...

  

-Original Message-
From: Vikram Vaswani [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 26, 2002 7:04 AM
To: [EMAIL PROTECTED]
Subject: [PHP] Finding out when a Web page has changed

Hi all,

I need to write an application that accepts a list of URLs and checks


them
  

on a daily basis (via cron) to see if the pages have changed in the


past
  

day.

I need some help with this. Does anyone know the most optimal way to


find
  

out when a particular Web page has been modified? I am thinking about
using
the Last-Modified: HTTP header - however, all servers do not return


this
  

header - any ideas on what the fallback should be?

TIA,

Vikram
--
I find your lack of faith disturbing.
  --Darth Vader

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php





  



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Finding out when a Web page has changed

2002-09-26 Thread Erwin

Marek Kilimajer wrote:
 Hope the sites have no banners :),  they change all the time

But the URL to the banners will be the same, so that's no change in the HTML
code ;-))

 [SNIP]


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Finding out when a Web page has changed

2002-09-26 Thread Justin French

Same with sites that have negligible daily changes (like today's date
dynamically inserted), or random changes (a random quote, tip, stock quote,
product, image, etc etc would all screw that up).

Justin


on 26/09/02 11:03 PM, Marek Kilimajer ([EMAIL PROTECTED]) wrote:

 Hope the sites have no banners :),  they change all the time
 
 John Holmes wrote:
 
 You could cache/save the actual contents of the file, then when you read
 it next time, compare it to what you saved and see if it changed. You
 may want to filter out everything but what's between body and /body,
 so you're not thinking it changed just b/c of something in the
 headers...
 
 ---John Holmes...
 
 
 
 -Original Message-
 From: Vikram Vaswani [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, September 26, 2002 7:04 AM
 To: [EMAIL PROTECTED]
 Subject: [PHP] Finding out when a Web page has changed
 
 Hi all,
 
 I need to write an application that accepts a list of URLs and checks
 
 
 them
 
 
 on a daily basis (via cron) to see if the pages have changed in the
 
 
 past
 
 
 day.
 
 I need some help with this. Does anyone know the most optimal way to
 
 
 find
 
 
 out when a particular Web page has been modified? I am thinking about
 using
 the Last-Modified: HTTP header - however, all servers do not return
 
 
 this
 
 
 header - any ideas on what the fallback should be?
 
 TIA,
 
 Vikram
 --
 I find your lack of faith disturbing.
 --Darth Vader
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 
 
 
 
 
 


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




RE: [PHP] Finding out when a Web page has changed

2002-09-26 Thread John Holmes

Yeah, true. Maybe you could just ereg() out the content. Each url would
need it's own ereg, though, so it won't be as easy to set up.

But, technically, if the quote changes, then the page has been updated,
even if it's dynamic. How do you define updated??

---John Holmes...

 -Original Message-
 From: Justin French [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, September 26, 2002 10:25 AM
 To: Marek Kilimajer; PHP
 Subject: Re: [PHP] Finding out when a Web page has changed
 
 Same with sites that have negligible daily changes (like today's date
 dynamically inserted), or random changes (a random quote, tip, stock
 quote,
 product, image, etc etc would all screw that up).
 
 Justin
 
 
 on 26/09/02 11:03 PM, Marek Kilimajer ([EMAIL PROTECTED]) wrote:
 
  Hope the sites have no banners :),  they change all the time
 
  John Holmes wrote:
 
  You could cache/save the actual contents of the file, then when you
 read
  it next time, compare it to what you saved and see if it changed.
You
  may want to filter out everything but what's between body and
 /body,
  so you're not thinking it changed just b/c of something in the
  headers...
 
  ---John Holmes...
 
 
 
  -Original Message-
  From: Vikram Vaswani [mailto:[EMAIL PROTECTED]]
  Sent: Thursday, September 26, 2002 7:04 AM
  To: [EMAIL PROTECTED]
  Subject: [PHP] Finding out when a Web page has changed
 
  Hi all,
 
  I need to write an application that accepts a list of URLs and
checks
 
 
  them
 
 
  on a daily basis (via cron) to see if the pages have changed in
the
 
 
  past
 
 
  day.
 
  I need some help with this. Does anyone know the most optimal way
to
 
 
  find
 
 
  out when a particular Web page has been modified? I am thinking
about
  using
  the Last-Modified: HTTP header - however, all servers do not
return
 
 
  this
 
 
  header - any ideas on what the fallback should be?
 
  TIA,
 
  Vikram
  --
  I find your lack of faith disturbing.
  --Darth Vader
 
  --
  PHP General Mailing List (http://www.php.net/)
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 
 
 
 
 
 
 
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Finding out when a Web page has changed

2002-09-26 Thread Vikram Vaswani

 I need to write an application that accepts a list of URLs and checks them
 on a daily basis (via cron) to see if the pages have changed in the past
day.
 
 I need some help with this. Does anyone know the most optimal way to find
 out when a particular Web page has been modified? I am thinking about using
 the Last-Modified: HTTP header - however, all servers do not return this
 header - any ideas on what the fallback should be?

You could calculate and store the MD5 hash of the page. If the hash is
different the next day, you know the page has been modified.

Does this mean that I need to read the entire contents of the HTTP stream
into a variable, calculate the hash and store it for comparison? Or is
there an easier way to get the MD5 hash?

Basically, I'm wondering if there is a way to do this without having to
read the entire URL contents via HTTP.

Vikram

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




RE: [PHP] Finding out when a Web page has changed

2002-09-26 Thread Vikram Vaswani

Yup, thought of that one - but it just seems a little sub-optimal ;) Any
way to do this without resorting to a brute-force read the entire file
stream over HTTP?

I can't think of one, so I guess I may do it this way after all - but if
something occurs to you, or anyone else on the list, please let me know.

Thanks for the help :)

Vikram

You could cache/save the actual contents of the file, then when you read
it next time, compare it to what you saved and see if it changed. You
may want to filter out everything but what's between body and /body,
so you're not thinking it changed just b/c of something in the
headers...

---John Holmes...

 -Original Message-
 From: Vikram Vaswani [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, September 26, 2002 7:04 AM
 To: [EMAIL PROTECTED]
 Subject: [PHP] Finding out when a Web page has changed
 
 Hi all,
 
 I need to write an application that accepts a list of URLs and checks
them
 on a daily basis (via cron) to see if the pages have changed in the
past
 day.
 
 I need some help with this. Does anyone know the most optimal way to
find
 out when a particular Web page has been modified? I am thinking about
 using
 the Last-Modified: HTTP header - however, all servers do not return
this
 header - any ideas on what the fallback should be?
 
 TIA,
 
 Vikram
 --
 I find your lack of faith disturbing.
  --Darth Vader
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php