Re: Hi

Harry Nutch Thu, 06 May 2010 17:53:32 -0700

Did u check  crawl-urlfilter.txt?
All the domain names that you'd like to crawl have to mentioned.
e.g.


# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*mersin\.edu\.tr/
+^http://([a-z0-9]*\.)*tubitak\.gov\.tr/

Also check property db.ignore.external.links in nutch-default.xml. Should be
set to false.

2010/5/5 Zehra Göçer <zgocer...@hotmail.com>

>
> i have problems about nutch.my project is link analysis i crawled "
> www.mersin.edu.tr" and i analyse linkdb and i saw all about 
> mersin.edu.trlinks.But i have to find other links in site example
> www.tubitak.gov.tr bu i cannot find?i have to find these links ?please
> help me
> _________________________________________________________________
> Yeni Windows 7: Size en uygun bilgisayarı bulun. Daha fazla bilgi edinin.
> http://windows.microsoft.com/shop

Re: Hi

Reply via email to