Ok thanks.. as far as crawling the entire subdomain.. what exact command would I use?
Because depth says how many pages deep to go.. is there anyway to hit every single page, without specifying depth? Or should I just say depth=10? Also, topN is no longer used, correct? Stefan Neufeind wrote: >Matthew Holt wrote: > > >>Question, >> I'm trying to index a subdomain of my intranet. How do I make it >>index the entire subdomain, but not index any pages off of the >>subdomain? Thanks! >> >> > >Have a look at crawl-urlfilter.txt in the conf/ directory. > ># accept hosts in MY.DOMAIN.NAME >+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/ > ># skip everything else >-. > > >Regards, > Stefan > > > _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
