[ http://issues.apache.org/jira/browse/NUTCH-13?page=comments#action_63403 
]
     
Andrzej Bialecki  commented on NUTCH-13:
----------------------------------------

Let's not be too hasty... There are legitimate cases when numeric IPs, even 
from the private address-spaces are appropriate and wanted.

The Fetcher needs to resolve every name to IP anyway, in order to contact the 
server. We could perform such resolution during fetchlist generation, and apply 
both url-based and numeric IP filters as needed in order to trim the 
fetchlists. The Fetcher could also record the IP in the segment data, so that 
the IP-based filtering can be applied when updating the database.

For any large Nutch installation having a local caching DNS server is a must 
anyway, so this should not result in a significantly increased outside traffic, 
or delays.

> If dns points to 127.0.0.1, the url is also crawled
> ---------------------------------------------------
>
>          Key: NUTCH-13
>          URL: http://issues.apache.org/jira/browse/NUTCH-13
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Reporter: Matthias Jaekle
>     Priority: Minor

>
> For example www.tik24.de points to 127.0.0.1.
> If you follow a link to www.tik24.de fetcher will crawl content from your own 
> machine.
> Wrong DNS entries could create unwanted entries in segments.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to