My nutch is on my localhost, and seems to be running fine... Here is what is very strange: out of 18 websites i put in the crawl-urlfilter.txtm and in my urls folder, only 3 websites come up in the search, and one of them is even not on my list... Very weird! Please take a look at my configurations(below) and see if you have any suggestions. (i suspect that i need to recrawl or something, but the recrawl script on wiki nutch didn't work. Also, should't google results come up too? ) The 3 websites that are searched are http://www.horse.com, http://en.wikipedia.org, and this one, which is not my lsit: http://www.ansi.okstate.edu/. Also, if i edit /opt/apache-tomcat-5.5.16/webapps/nutch-0.8.1/WEB-INF/classes/nutch-site.xml, then my nutch dosn't search at all! it just says Hits 0-0 (out of about 0 total matching pages): .
Here are some files i edited: crawl-urlfilter.txt accept hosts in MY.DOMAIN.NAME +^http://([a-z0-9]*\.)*en.wikipedia.org/ +^http://([a-z0-9]*\.)*www.google.com/ +^http://([a-z0-9]*\.)*www.search.yahoo.com/ +^http://([a-z0-9]*\.)*www.apache.org/ +^http://([a-z0-9]*\.)*www.yahoo.com/ +^http://([a-z0-9]*\.)*www.amazon.com/ +^http://([a-z0-9]*\.)*www.about.com/ +^http://([a-z0-9]*\.)*www.bartleby.com/ +^http://([a-z0-9]*\.)*www.cnn.com/ +^http://([a-z0-9]*\.)*www.download.com/ +^http://([a-z0-9]*\.)*www.reference.com/ +^http://([a-z0-9]*\.)*www.weather.com/ +^http://([a-z0-9]*\.)*www.nih.gov/ +^http://([a-z0-9]*\.)*www.usa.gov/ +^http://([a-z0-9]*\.)*www.monster.com/ +^http://([a-z0-9]*\.)*www.time.com/time/ +^http://([a-z0-9]*\.)*www.boerwar.us shoppinglist.txt (in the urls folder) http://en.wikipedia.org http://www.google.com http://search.yahoo.com/ http://www.yahoo.com/ http://www.amazon.com/ http://www.about.com/ http://www.bartleby.com/ http://www.cnn.com/ http://www.download.com/ http://www.reference.com/ http://www.wikipedia.org/ http://www.weather.com/ http://www.nih.gov/ http://www.usa.gov/ http://www.monster.com/ http://www.time.com/time/ http://boerwar.us the nutch-site.xml (/usr/nutch-0.8.1/conf) <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>searcher.dir</name> <value>"/usr/nutch-0.8.1/crawl/"</value> </property> <property> <name>plugin.includes</name> <value>protocol-file|protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)</value> </property> <property> <name>http.agent.name</name> <value>Kate</value> <description>Kate H. </description> </property> <property> <name>http.agent.description</name> <value>Nutch spiderman</value> <description> Nutch spiderman </description> </property> <property> <name>http.agent.email</name> <value>MyEmail</value> <description>[EMAIL PROTECTED] </description> </property> </configuration> Thanks in advance- -- View this message in context: http://www.nabble.com/Something-very%2C-very-strange....about-how-my-nutch-runs...-please-help%21-tp17840748p17840748.html Sent from the Nutch - User mailing list archive at Nabble.com.
