In addition to Lewis suggestions, please try giving bigger value to topN,
if configuration files are defined right way, you will see more crawls.


On Thu, Mar 14, 2013 at 12:30 AM, Lewis John Mcgibbney <
[email protected]> wrote:

> You can use the parsechecker from the nutch script to see what outlinks you
> should be picking up.
> Once you know how the crawler is configured then you can begin to assert
> why outlinks are not either being parsed out, or subsequently being
> fetched.
> hth
>
> On Wed, Mar 13, 2013 at 6:13 PM, Dat Tran <[email protected]> wrote:
>
> > Thank for your reply. After configure urlfilter, i execute this command
> to
> > crawl
> > bin/nutch crawl urls -topN 10 -depth 3
> > (urls is the directory where seed list located ).
> > But  it crawls, fetchs and parses only links which are defined in seed
> > list,
> > not the outlinks.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Iterative-Crawling-tp4046501p4047209.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> *Lewis*
>



-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>

Reply via email to