Re: [php-list] crawler

William Piper Mon, 25 Aug 2008 06:15:34 -0700

shivam0101 wrote:
> 
> 
> Hi,
> 
> After searching google for crawler this is what i understand.
> 
> A crawler extracts all links of the site untill no new links are
> found. It keeps the links either in the database or in a file. By
> comparing the links and the 'keyword' of search the search engine
> prints the links.
> 
> For example, if the link contains, 'ABC' and the search key is 'ABC'
> then that link will be printed.
> 
> I have a members page which lists all the members of the site. It
> contains the name, age, SPAM, etc of the members. Since it contains
> more than 1000 members i am listing 10 members per page. So the link
> will be <a href='members.php?page_id=1'> Next </a>. The page_id will
> be 2..3... and goes on.
> 
> If a member whose name 'ABC' will be listed in page 10 of the members
> page. i.e <a href='members.php?page_id=10'> will contain member 'ABC'
> details, and i give a search keyword 'ABC' how the crawler will get
> 'ABC' when it crawls the members page.
> 
> Thanks
>


You should dynamically create a meta keywords tag and page title on each 
page to have a chance of being near the top of any search. In the meta 
keywords tag, you will want to put the search words to result that page. 
Then when the crawler gets to the page, it will search the keywords, 
title, and text & cache them. Later when you type in 'abc' company, it 
will know that page_id=10 is where you want to go. Of course, in the 
example 'abc' company, there will be a gazillion hits, so you will 
probably not be anywhere near the top of the list.

There are definitely ways to get your listing to the top of the search 
on natural listings, even when there are a gazillion hits, but that is a 
topic for another discussion on another list.

If you do not want the crawler to go through your pagination, simply add 
an entry in robots.txt: Disallow: /*?*  This will tell the crawlers to 
not follow any links with '?' in it from the root directory forward.

-bp

Re: [php-list] crawler

Reply via email to