Ok thanks.. as far as crawling the entire subdomain.. what exact command 
would I use?

Because depth says how many pages deep to go.. is there anyway to hit 
every single page, without specifying depth? Or should I just say 
depth=10? Also, topN is no longer used, correct?

Stefan Neufeind wrote:

>Matthew Holt wrote:
>  
>
>>Question,
>>   I'm trying to index a subdomain of my intranet. How do I make it
>>index the entire subdomain, but not index any pages off of the
>>subdomain? Thanks!
>>    
>>
>
>Have a look at crawl-urlfilter.txt in the conf/ directory.
>
># accept hosts in MY.DOMAIN.NAME
>+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
>
># skip everything else
>-.
>
>
>Regards,
> Stefan
>
>  
>


_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to