Implementing a FIFO queue will certainly work for the crawler but is not friendly toward the websites being crawled. Using a FIFO queue, as you mentioned, means that you are doing a breadth first search through the site. It is very likely that you will send hundreds of page requests to the same server in a very short amount of time. Depending on how you design your data structures you should be able to record the time of the last request for a page from any particular server and pace the requests so that you don't request more than a page every few minutes.
It sounds like you are implementing this as a recursive call to a crawl function. It seems to me that you should parse out the URL into a scheme/server/path/filename/port and store all of that information in a database of your choice, along with other important data such as the number of times you've visited a site, when the last visit was made, whether the site is still active, etc. Corey -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".
