I'm using 1.3. This is a new setup, so I'm running the latest versions. I did inject the urls already. It's just that the part I was having issues with was the fetch, etc. I'm using the steps at Lucid Imagination ยป Using Nutch with Solr<http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/>except that I alredy had Nutch set up and configured.
When did noParsing change? I noticed that the Nutch wiki is out of date, so I'm not sure what the current setups are. The log data made some mention of hadoop, but I don't remember what it was. I'll see if it happens again and post the message. On Thu, Sep 22, 2011 at 10:44 AM, lewis john mcgibbney < [email protected]> wrote: > Hi Bai, > > You haven't mentioned which Nutch version you're using... this would be > good > if you could. > > You haven't injected any seed URLs into your crawldb. From memory I think > the -topN parameter should be passed to the generate command. > > Just to note, it is not necessary to set noParsing while executing the > fetch > command. This is already default behaviour. Not sure why your machine is > churning but this shouldn't be happening. Do you have any log data to > suggest why this is the case. > > On Thu, Sep 22, 2011 at 1:26 PM, Bai Shen <[email protected]> wrote: > > > So I was able to get Nutch up and working using the crawl command. I set > > my > > depth and topN and it ran and indexed the pages for me. > > > > But not I'm trying to split out the separate pieces in order to > distribute > > them and add my own parser. I'm running the following. > > > > bin/nutch generate crawl/crawldb crawl/segments > > export SEGMENT=crawl/segments/`ls -tr crawl/segments|tail -1` > > bin/nutch fetch $SEGMENT -noParsing > > bin/nutch parse $SEGMENT > > bin/nutch updatedb crawl/crawldb $SEGMENT -filter -normalize > > > > > > I don't see any way to determine how deep to crawl. Is this possible, or > > do > > I have to manually manage the db? And if so, how do I do that? > > > > And as a side note, why does Nutch invoke hadoop during the fetch command > > even though I have noParsing set? After fetching my links, my machine > > churns for around twenty minutes before finally ending, even though all > the > > fetch threads completed already. > > > > Thanks. > > > > > > -- > *Lewis* >

