RE: [PHP] Finding out when a Web page has changed
Yup, thought of that one - but it just seems a little sub-optimal ;) Any way to do this without resorting to a brute-force "read the entire file stream over HTTP"? I can't think of one, so I guess I may do it this way after all - but if something occurs to you, or anyone else on the list, please let me know. Thanks for the help :) Vikram >You could cache/save the actual contents of the file, then when you read >it next time, compare it to what you saved and see if it changed. You >may want to filter out everything but what's between and , >so you're not thinking it changed just b/c of something in the >headers... > >---John Holmes... > >> -Original Message- >> From: Vikram Vaswani [mailto:[EMAIL PROTECTED]] >> Sent: Thursday, September 26, 2002 7:04 AM >> To: [EMAIL PROTECTED] >> Subject: [PHP] Finding out when a Web page has changed >> >> Hi all, >> >> I need to write an application that accepts a list of URLs and checks >them >> on a daily basis (via cron) to see if the pages have changed in the >past >> day. >> >> I need some help with this. Does anyone know the most optimal way to >find >> out when a particular Web page has been modified? I am thinking about >> using >> the Last-Modified: HTTP header - however, all servers do not return >this >> header - any ideas on what the fallback should be? >> >> TIA, >> >> Vikram >> -- >> "I find your lack of faith disturbing." >> --Darth Vader >> >> -- >> PHP General Mailing List (http://www.php.net/) >> To unsubscribe, visit: http://www.php.net/unsub.php > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Finding out when a Web page has changed
>> I need to write an application that accepts a list of URLs and checks them >> on a daily basis (via cron) to see if the pages have changed in the past day. >> >> I need some help with this. Does anyone know the most optimal way to find >> out when a particular Web page has been modified? I am thinking about using >> the Last-Modified: HTTP header - however, all servers do not return this >> header - any ideas on what the fallback should be? > >You could calculate and store the MD5 hash of the page. If the hash is >different the next day, you know the page has been modified. Does this mean that I need to read the entire contents of the HTTP stream into a variable, calculate the hash and store it for comparison? Or is there an easier way to get the MD5 hash? Basically, I'm wondering if there is a way to do this without having to read the entire URL contents via HTTP. Vikram -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Finding out when a Web page has changed
Yeah, true. Maybe you could just ereg() out the content. Each url would need it's own ereg, though, so it won't be as easy to set up. But, technically, if the quote changes, then the page has been updated, even if it's dynamic. How do you define "updated"?? ---John Holmes... > -Original Message- > From: Justin French [mailto:[EMAIL PROTECTED]] > Sent: Thursday, September 26, 2002 10:25 AM > To: Marek Kilimajer; PHP > Subject: Re: [PHP] Finding out when a Web page has changed > > Same with sites that have negligible daily changes (like today's date > dynamically inserted), or random changes (a random quote, tip, stock > quote, > product, image, etc etc would all screw that up). > > Justin > > > on 26/09/02 11:03 PM, Marek Kilimajer ([EMAIL PROTECTED]) wrote: > > > Hope the sites have no banners :), they change all the time > > > > John Holmes wrote: > > > >> You could cache/save the actual contents of the file, then when you > read > >> it next time, compare it to what you saved and see if it changed. You > >> may want to filter out everything but what's between and > , > >> so you're not thinking it changed just b/c of something in the > >> headers... > >> > >> ---John Holmes... > >> > >> > >> > >>> -Original Message- > >>> From: Vikram Vaswani [mailto:[EMAIL PROTECTED]] > >>> Sent: Thursday, September 26, 2002 7:04 AM > >>> To: [EMAIL PROTECTED] > >>> Subject: [PHP] Finding out when a Web page has changed > >>> > >>> Hi all, > >>> > >>> I need to write an application that accepts a list of URLs and checks > >>> > >>> > >> them > >> > >> > >>> on a daily basis (via cron) to see if the pages have changed in the > >>> > >>> > >> past > >> > >> > >>> day. > >>> > >>> I need some help with this. Does anyone know the most optimal way to > >>> > >>> > >> find > >> > >> > >>> out when a particular Web page has been modified? I am thinking about > >>> using > >>> the Last-Modified: HTTP header - however, all servers do not return > >>> > >>> > >> this > >> > >> > >>> header - any ideas on what the fallback should be? > >>> > >>> TIA, > >>> > >>> Vikram > >>> -- > >>> "I find your lack of faith disturbing." > >>> --Darth Vader > >>> > >>> -- > >>> PHP General Mailing List (http://www.php.net/) > >>> To unsubscribe, visit: http://www.php.net/unsub.php > >>> > >>> > >> > >> > >> > >> > >> > > > > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Finding out when a Web page has changed
Same with sites that have negligible daily changes (like today's date dynamically inserted), or random changes (a random quote, tip, stock quote, product, image, etc etc would all screw that up). Justin on 26/09/02 11:03 PM, Marek Kilimajer ([EMAIL PROTECTED]) wrote: > Hope the sites have no banners :), they change all the time > > John Holmes wrote: > >> You could cache/save the actual contents of the file, then when you read >> it next time, compare it to what you saved and see if it changed. You >> may want to filter out everything but what's between and , >> so you're not thinking it changed just b/c of something in the >> headers... >> >> ---John Holmes... >> >> >> >>> -Original Message- >>> From: Vikram Vaswani [mailto:[EMAIL PROTECTED]] >>> Sent: Thursday, September 26, 2002 7:04 AM >>> To: [EMAIL PROTECTED] >>> Subject: [PHP] Finding out when a Web page has changed >>> >>> Hi all, >>> >>> I need to write an application that accepts a list of URLs and checks >>> >>> >> them >> >> >>> on a daily basis (via cron) to see if the pages have changed in the >>> >>> >> past >> >> >>> day. >>> >>> I need some help with this. Does anyone know the most optimal way to >>> >>> >> find >> >> >>> out when a particular Web page has been modified? I am thinking about >>> using >>> the Last-Modified: HTTP header - however, all servers do not return >>> >>> >> this >> >> >>> header - any ideas on what the fallback should be? >>> >>> TIA, >>> >>> Vikram >>> -- >>> "I find your lack of faith disturbing." >>> --Darth Vader >>> >>> -- >>> PHP General Mailing List (http://www.php.net/) >>> To unsubscribe, visit: http://www.php.net/unsub.php >>> >>> >> >> >> >> >> > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Finding out when a Web page has changed
Marek Kilimajer wrote: > Hope the sites have no banners :), they change all the time But the URL to the banners will be the same, so that's no change in the HTML code ;-)) > [SNIP] -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Finding out when a Web page has changed
Hope the sites have no banners :), they change all the time John Holmes wrote: >You could cache/save the actual contents of the file, then when you read >it next time, compare it to what you saved and see if it changed. You >may want to filter out everything but what's between and , >so you're not thinking it changed just b/c of something in the >headers... > >---John Holmes... > > > >>-Original Message- >>From: Vikram Vaswani [mailto:[EMAIL PROTECTED]] >>Sent: Thursday, September 26, 2002 7:04 AM >>To: [EMAIL PROTECTED] >>Subject: [PHP] Finding out when a Web page has changed >> >>Hi all, >> >>I need to write an application that accepts a list of URLs and checks >> >> >them > > >>on a daily basis (via cron) to see if the pages have changed in the >> >> >past > > >>day. >> >>I need some help with this. Does anyone know the most optimal way to >> >> >find > > >>out when a particular Web page has been modified? I am thinking about >>using >>the Last-Modified: HTTP header - however, all servers do not return >> >> >this > > >>header - any ideas on what the fallback should be? >> >>TIA, >> >>Vikram >>-- >>"I find your lack of faith disturbing." >> --Darth Vader >> >>-- >>PHP General Mailing List (http://www.php.net/) >>To unsubscribe, visit: http://www.php.net/unsub.php >> >> > > > > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Finding out when a Web page has changed
You could cache/save the actual contents of the file, then when you read it next time, compare it to what you saved and see if it changed. You may want to filter out everything but what's between and , so you're not thinking it changed just b/c of something in the headers... ---John Holmes... > -Original Message- > From: Vikram Vaswani [mailto:[EMAIL PROTECTED]] > Sent: Thursday, September 26, 2002 7:04 AM > To: [EMAIL PROTECTED] > Subject: [PHP] Finding out when a Web page has changed > > Hi all, > > I need to write an application that accepts a list of URLs and checks them > on a daily basis (via cron) to see if the pages have changed in the past > day. > > I need some help with this. Does anyone know the most optimal way to find > out when a particular Web page has been modified? I am thinking about > using > the Last-Modified: HTTP header - however, all servers do not return this > header - any ideas on what the fallback should be? > > TIA, > > Vikram > -- > "I find your lack of faith disturbing." > --Darth Vader > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Finding out when a Web page has changed
Hi all, I need to write an application that accepts a list of URLs and checks them on a daily basis (via cron) to see if the pages have changed in the past day. I need some help with this. Does anyone know the most optimal way to find out when a particular Web page has been modified? I am thinking about using the Last-Modified: HTTP header - however, all servers do not return this header - any ideas on what the fallback should be? TIA, Vikram -- "I find your lack of faith disturbing." --Darth Vader -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php