On Wed, Jun 23, 2010 at 5:27 PM, Dennis Kubes <[email protected]> wrote:

> You may still see some urls that *seem* to be outside of your domains list
> while using the domain urlfilter.  Remember the following:
>
>  1. Urls are checked in order of domain suffix, domain name, and
>     hostname.  If you have .com and something.net, urls in
>     something.com will also get picked up.
>  2. This doesn't handle redirects, it only handles generated urls.  If
>     your domain urls file has something.com and the original url is
>     http://something.com/something.html but redirects to
>     http://ww2.something.net/redirect/login.html for example, the url
>     will still get crawled and saved.
>
> For verification grep through the logs to be sure.  Be aware of the
> redirects if you see a few urls that don't match your patterns.  If you see
> a lot that don't match then something isn't working.
>
> Dennis
>
>
Thanks Dennis, that makes sense.  The domain filter seems to be working and
is all I need for now.

-Max

Reply via email to