[PHP] Re: Crawlers (was parsing large files - PHP or Perl)

2005-02-17 Thread Jamie Alessio
Is there anyone on this list who has written fast and decent 
crawlers in PHP who would be willing to share their experiences?

My first inclination would be to use an existing crawler to grab the 
pages and store all the files locally (even if only temporarily). Then, 
you can use PHP to do whatever type of processing you want on those 
files and can even have PHP crawl deeper based on links in those files 
if necessary. I'd have a hard time coming up with a reason to think I 
would implement a better web crawler on my own than is already available 
from other projects that focus on that. What about existing search 
systems like:

Nutch - http://www.nutch.org
mnoGoSearch - http://mnogosearch.org/
htdig - http://www.htdig.org/
or maybe even a wget -r - http://www.gnu.org/software/wget/wget.html
(I'm sure I missed a bunch of great options)
Just an idea - I'd also like to hear if someone has written nice 
crawling code in PHP.

- Jamie
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


RE: [PHP] Re: Crawlers (was parsing large files - PHP or Perl)

2005-02-17 Thread Warren Vail
Check out PHPdig;

http://www.phpdig.net/

Warren

 -Original Message-
 From: Jamie Alessio [mailto:[EMAIL PROTECTED]
 Sent: Thursday, February 17, 2005 9:22 AM
 To: John Cage
 Cc: php-general@lists.php.net
 Subject: [PHP] Re: Crawlers (was parsing large files - PHP or Perl)
 
 
  Is there anyone on this list who has written fast and decent 
  crawlers in PHP who would be willing to share their experiences?
  
 My first inclination would be to use an existing crawler to grab the 
 pages and store all the files locally (even if only temporarily). Then, 
 you can use PHP to do whatever type of processing you want on those 
 files and can even have PHP crawl deeper based on links in those files 
 if necessary. I'd have a hard time coming up with a reason to think I 
 would implement a better web crawler on my own than is already available 
 from other projects that focus on that. What about existing search 
 systems like:
 
 Nutch - http://www.nutch.org
 mnoGoSearch - http://mnogosearch.org/
 htdig - http://www.htdig.org/
 or maybe even a wget -r - http://www.gnu.org/software/wget/wget.html
 (I'm sure I missed a bunch of great options)
 
 Just an idea - I'd also like to hear if someone has written nice 
 crawling code in PHP.
 
 - Jamie
 
 -- 
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php