Re: Problems with indexing sub-section of a site

foobar3001 Sat, 24 May 2008 12:22:24 -0700

Eric J. Christeson-2 wrote:
> 
> On Thu, May 22, 2008 at 07:46:16PM -0700, foobar3001 wrote:
> Did a quick scan of the page in question, and I noticed the urls are of
> this form:
>       http://www.geekzone.co.nz/blog.asp?blogid=207
> 
> Could you filter like 
> 
>       +^http://([a-z0-9]*\.)*geekzone.co.nz/blog.asp\?blogid=207
> 

Hello!

Thank you very much for the reply. Yes, I had noticed that as well,
but filtering site-specific URL's like that was what I wanted to avoid.
I'm trying to find a generic solution, not something that's specific
to this (or any other site).

Basically, tell the Nutch crawler to work for a certain depth through
non-specified-domain links to see if it comes back to pages belonging
to the specified domain again.

-- 
View this message in context: 
http://www.nabble.com/Problems-with-indexing-sub-section-of-a-site-tp17417650p17451041.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Problems with indexing sub-section of a site

Reply via email to