date:20100506

parse-pdf plugin with external libraries

2010-05-06 Thread Claudio Martella

Hi, i'm trying to rebuild nutch to compile parse-pdf plugin with the external libraries (jai_core.jar and jai_codec.jar). So i downloaded the two jars and put them in the lib/ of src/plugins/parse-pdf/. I uncommented the two lines in plugin.xml (both in src/plugins/parse-pdf/ and in plugins/parse-

Re: JobTracker gets stuck with DFS problems

2010-05-06 Thread Emmanuel de Castro Santana

"Again, this procedure does NOT work when using HDFS - you won't even see the partial output (without some serious hacking)" Got it ! "You can simply set the fetcher.parsing config option to false." Found it ! Thanks for the help 2010/5/3 Andrzej Bialecki > On 2010-05-03 22:58, Emmanuel de

Hi

2010-05-06 Thread Zehra Göçer

i have problems about nutch.my project is link analysis i crawled "www.mersin.edu.tr" and i analyse linkdb and i saw all about mersin.edu.tr links.But i have to find other links in site example www.tubitak.gov.tr bu i cannot find?i have to find these links ?please help me

Re: Hi

2010-05-06 Thread Harry Nutch

Did u check crawl-urlfilter.txt? All the domain names that you'd like to crawl have to mentioned. e.g. # accept hosts in MY.DOMAIN.NAME +^http://([a-z0-9]*\.)*mersin\.edu\.tr/ +^http://([a-z0-9]*\.)*tubitak\.gov\.tr/ Also check property db.ignore.external.links in nutch-default.xml. Should be se