Yes, I have add this  to my crawl-urlfilter.txt

+^http://([a-z0-9]*\.)*(yahoo.com|cnn.com|amazon.com|msn.com|google.com)/


but i still have the problem that I mention in my previous mail.

On 4/10/07, Michael Wechner <[EMAIL PROTECTED]> wrote:
> Meryl Silverburgh wrote:
>
> > Hi,
> >
> > i am trying to setup Nutch.
> > I setup 1 site in my urls file:
> > http://www.yahoo.com
>
>
> have yiu added it to the URL/Crawl filters?
>
> Cheers
>
> Michael
>
> >
> > And then I start crawl using this command:
> > $bin/nutch crawl urls -dir crawl -depth 1 -topN 5
> >
> > But I get this "No URLs to fecth", can you please tell me what am i
> > missing?
> > $ bin/nutch crawl urls -dir crawl -depth 1 -topN 5
> > crawl started in: crawl
> > rootUrlDir = urls
> > threads = 10
> > depth = 1
> > topN = 5
> > Injector: starting
> > Injector: crawlDb: crawl/crawldb
> > Injector: urlDir: urls
> > Injector: Converting injected urls to crawl db entries.
> > Injector: Merging injected urls into crawl db.
> > Injector: done
> > Generator: Selecting best-scoring urls due for fetch.
> > Generator: starting
> > Generator: segment: crawl/segments/20070406140513
> > Generator: filtering: false
> > Generator: topN: 5
> > Generator: jobtracker is 'local', generating exactly one partition.
> > Generator: 0 records selected for fetching, exiting ...
> > Stopping at depth=0 - no more URLs to fetch.
> > No URLs to fetch - check your seed list and URL filters.
> > crawl finished: crawl
> >
>
>
> --
> Michael Wechner
> Wyona      -   Open Source Content Management   -    Apache Lenya
> http://www.wyona.com                      http://lenya.apache.org
> [EMAIL PROTECTED]                        [EMAIL PROTECTED]
> +41 44 272 91 61
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to