Re: Cannot crawl problem

lewis john mcgibbney Sat, 16 Jul 2011 04:47:37 -0700

Hi,

We have anumber of archived tutorials which exist here [1], please feel free
to add to the wiki as you see fit.


If you do not already have a username and write permissions you can sign up
on the wiki front page

Thanks

[1]
http://wiki.apache.org/nutch/Archive%20and%20Legacy#Nutch_.3C1.3_Tutorials

On Sat, Jul 16, 2011 at 11:48 AM, Kelvin <[email protected]> wrote:

> Dear all,
>
> Just to update, I have solved my problem. Apparently, we also need to edit
> this file conf/crawl-urlfilter.txt, besides conf/regex-urlfilter.txt
>
> Can we amend this pagehttp://wiki.apache.org/nutch/NutchTutorialPre1.3
> I am sure many others encounter the same problem as me.
>
>
>
>
>
>
> ________________________________
> From: Kelvin <[email protected]>
> To: "'[email protected]'" <[email protected]>
> Sent: Saturday, 16 July 2011 2:32 PM
> Subject: Cannot crawl problem
>
> Dear all,
>
> I was able to get nutch 1.2 working previously. I have done a clean install
> of nutch 1.2 now, and I strictly follow the instructions below:
> http://wiki.apache.org/nutch/NutchTutorialPre1.3
>
> But now I have encounter this problem below. Why is it so? Do we need to
> setup tomcat in order to get nutch crawling working? Previously I have set
> up both tomcat and nutch together, but now I would only like to have nutch
> only.
>
>
> Thank you for your kind help.
>
>
> depth 3 -topN 50
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> indexer=lucene
> topN = 50
> Injector: starting at 2011-07-16 14:30:29
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-16 14:30:32, elapsed: 00:00:02
> Generator: starting at 2011-07-16 14:30:32
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
>



-- 
*Lewis*

Re: Cannot crawl problem

Reply via email to