Eric J. Christeson-2 wrote:
>
> On Thu, May 22, 2008 at 07:46:16PM -0700, foobar3001 wrote:
> Did a quick scan of the page in question, and I noticed the urls are of
> this form:
> http://www.geekzone.co.nz/blog.asp?blogid=207
>
> Could you filter like
>
> +^http://([a-z0-9]*\.)*geekzone.co.nz/blog.asp\?blogid=207
>
Hello!
Thank you very much for the reply. Yes, I had noticed that as well,
but filtering site-specific URL's like that was what I wanted to avoid.
I'm trying to find a generic solution, not something that's specific
to this (or any other site).
Basically, tell the Nutch crawler to work for a certain depth through
non-specified-domain links to see if it comes back to pages belonging
to the specified domain again.
--
View this message in context:
http://www.nabble.com/Problems-with-indexing-sub-section-of-a-site-tp17417650p17451041.html
Sent from the Nutch - User mailing list archive at Nabble.com.