Diego Basch wrote:
In my opinion, the only significant improvement would
be the ability to reduce cloaking. Cloaked servers
present a different page to a crawler based on two
things: the user agent and the ip address range. If
crawlers used random ip addresses and Mozilla as a
user agent, cloakers would have a harder time telling
them apart from regular users.
  

This means to violate the robot.txt raccomendation. You can do this from a single location.
What is the difference?
Cloaking is one of the main reasons Google's relevance
has decreased over time, this is why I believe a
distributed crawling approach has some merit. Of
course, it would be pointless if spammers could tamper
with the crawling process.

Diego.

--- Antonio Gulli <[EMAIL PROTECTED]> wrote:
  
Diego Basch wrote:

    
he main problem with this approach is:

How do you stop malicious users from reporting
      
bogus
    
changes?
 

      
My issues are about the need of such approach:
(Distributed spidering) ?
Spidering costs are peanuts. Indexing and above all
serving the queries  
are the main cost.

-- 
"With a heavy dose of fear and violence, and a lot
of money 
for projects,  I think we can convince people that
we are here 
to help them." LT. COL. NATHAN SASSAMAN NYtimes.com
7th, Dec 2003 

http://www.di.unipi.it/~gulli/





    
-------------------------------------------------------
  
This SF.Net email is sponsored by: IBM Linux
Tutorials
Free Linux tutorial presented by Daniel Robbins,
President and CEO of
GenToo technologies. Learn everything from
fundamentals to system

    
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
  
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]

    
https://lists.sourceforge.net/lists/listinfo/nutch-developers


__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

  


-- 
"With a heavy dose of fear and violence, and a lot of money 
for projects,  I think we can convince people that we are here 
to help them." LT. COL. NATHAN SASSAMAN NYtimes.com 7th, Dec 2003 

http://www.di.unipi.it/~gulli/

Reply via email to