Thanks. I change my nutch-default.xml to the following: <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-(html)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> </property>
But I still see this error message, I don't expect it tries to fetch js files at all. Error parsing: http://www.cnn.com/exchange/submit/pokkariJavascript.js: failed(2,200): org.apache.nutch.parse.ParseException: parser not found for contentType=application/x-javascript url=http://www.cnn.com/exchange/submit/pokkariJavascript.js And why it fetch rss file too? fetching http://rss.cnn.com/rss/cnn_ireports.rss Any help is appreciated. On 4/12/07, Ratnesh,V2Solutions India <[EMAIL PROTECTED]> wrote: > > HI, > what you can do is remove parse-js and other related plugin from > nutch-site.xml file and nutch-default.xml file both . > but its not recommended to do change in nutch-default.xml , though sometimes > without changing in nutch-default.xml , it does not affect . > > so you see what the changes you can do according to the requirement I am > sure once you remove the parse-js It wount crawl javascript and try removing > other plugins as parse-msword etc. > > I hope that it will done > > Ratnesh,V2Solutions,India > > > > Meryl Silverburgh wrote: > > > > Hi, > > > > How can I configure nutch just crawl html links (no images, no > > javascript files, no css files)? > > And it won't record in the crawl database for non html pages links. > > > > thank you. > > > > > > -- > View this message in context: > http://www.nabble.com/How-to-config-nutch-just-crawl-html-links--tf3562947.html#a9957697 > Sent from the Nutch - User mailing list archive at Nabble.com. > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
