Thanks Chris... Here is my situation... I want to crawl just a local site for our intranet. We have just rolled out an asp only website from a pure html site. I ran nutch on the old site and got great results. Since moving to this new site I am have a devil of a time retrieving good information and missing a ton of info all together. I am not sure what settings I need to change to get good results. One setting that I have set does produce good results but it seems to crawl other website and not just my domain. The last line of the crawl-urlfilter file I just replace the - with + so it does not ignore other information. Our site is www.woodward.edu I was wondering if someone on this list can crawl this site and only this domain and see what they come up with. Woodward.edu is the domain. I am just stumped as what to do next. I am running a nightly build from January 26th 2006.
My criteria for our local search is to be able to search PDF, images, doc, and web content. You can go here and see what the search page pulls up http://search.woodward.edu . Thanks for any help this list can provide. Andy Morris -----Original Message----- From: Chris Mattmann [mailto:[EMAIL PROTECTED] Sent: Thursday, February 02, 2006 7:59 PM To: [email protected] Subject: RE: Xml? Hi Andy, > What is this error from? Wow, super cool! You're the first post I've seen to the list regarding these log messages that I put in :-) For that matter, they're log warnings, not errors really: > 060202 141539 ParserFactory:Plugin: parse-text mapped to contentType > text/xml via parse-plugins.xml, but its plugin.xml file does not claim > to support contentType: text/xml This one says that you have the parse-text plugin mapped to the contentType "text/xml" in the parse-plugins.xml file. However, this is kind of weird because the plugin.xml file for the parse-text plugin does not claim to support "text/xml". So, it's just a warning. > 060202 141539 ParserFactory:Plugin: parse-html mapped to contentType > text/xml via parse-plugins.xml, but its plugin.xml file does not claim > to support contentType: text/xml Same issue here. > 060202 141539 ParserFactory: Plugin: parse-rss mapped to contentType > text/xml via parse-plugins.xml, but not enabled via plugin.includes in > nutch-default.xml This is another cool one (in my opinion :-) ). It says that you went ahead and mapped parse-rss to the contentType "text/xml" in parse-plugins.xml, however, you didn't enable parse-rss in the plugin.includes property in nutch-default.xml, or nutch-site.xml. Does that make sense? Cheers, Chris > > Andy ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
