Re: JobTracker gets stuck with DFS problems

2010-05-06 Thread Emmanuel de Castro Santana
Again, this procedure does NOT work when using HDFS - you won't even see the partial output (without some serious hacking) Got it ! You can simply set the fetcher.parsing config option to false. Found it ! Thanks for the help 2010/5/3 Andrzej Bialecki a...@getopt.org On 2010-05-03 22:58,

Hi

2010-05-06 Thread Zehra Göçer
i have problems about nutch.my project is link analysis i crawled www.mersin.edu.tr and i analyse linkdb and i saw all about mersin.edu.tr links.But i have to find other links in site example www.tubitak.gov.tr bu i cannot find?i have to find these links ?please help me

Re: Hi

2010-05-06 Thread Harry Nutch
Did u check crawl-urlfilter.txt? All the domain names that you'd like to crawl have to mentioned. e.g. # accept hosts in MY.DOMAIN.NAME +^http://([a-z0-9]*\.)*mersin\.edu\.tr/ +^http://([a-z0-9]*\.)*tubitak\.gov\.tr/ Also check property db.ignore.external.links in nutch-default.xml. Should be