My nutch is on my localhost, and seems to be running fine... 
Here is what is very strange: out of 18 websites i put in the
crawl-urlfilter.txtm and in my urls folder, only 3 websites come up in the
search, and one of them is even not on my list... Very weird! Please take a
look at my configurations(below) and see if you have any suggestions. (i
suspect that i need to recrawl or something, but the recrawl script on wiki
nutch didn't work. Also, should't google results come up too? )
The 3 websites that are searched are  http://www.horse.com,
http://en.wikipedia.org, and this one, which is not my lsit:
http://www.ansi.okstate.edu/. 
Also, if i edit
/opt/apache-tomcat-5.5.16/webapps/nutch-0.8.1/WEB-INF/classes/nutch-site.xml,
then my nutch dosn't search at all! it just says Hits 0-0 (out of about 0
total matching pages): . 

Here are some files i edited:
crawl-urlfilter.txt
 accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*en.wikipedia.org/
+^http://([a-z0-9]*\.)*www.google.com/
+^http://([a-z0-9]*\.)*www.search.yahoo.com/
+^http://([a-z0-9]*\.)*www.apache.org/
+^http://([a-z0-9]*\.)*www.yahoo.com/
+^http://([a-z0-9]*\.)*www.amazon.com/
+^http://([a-z0-9]*\.)*www.about.com/
+^http://([a-z0-9]*\.)*www.bartleby.com/
+^http://([a-z0-9]*\.)*www.cnn.com/
+^http://([a-z0-9]*\.)*www.download.com/
+^http://([a-z0-9]*\.)*www.reference.com/
+^http://([a-z0-9]*\.)*www.weather.com/
+^http://([a-z0-9]*\.)*www.nih.gov/
+^http://([a-z0-9]*\.)*www.usa.gov/
+^http://([a-z0-9]*\.)*www.monster.com/
+^http://([a-z0-9]*\.)*www.time.com/time/
+^http://([a-z0-9]*\.)*www.boerwar.us

shoppinglist.txt (in the urls folder)
http://en.wikipedia.org
http://www.google.com
http://search.yahoo.com/
http://www.yahoo.com/
http://www.amazon.com/
http://www.about.com/
http://www.bartleby.com/
http://www.cnn.com/
http://www.download.com/
http://www.reference.com/
http://www.wikipedia.org/
http://www.weather.com/
http://www.nih.gov/
http://www.usa.gov/
http://www.monster.com/
http://www.time.com/time/
http://boerwar.us

the nutch-site.xml (/usr/nutch-0.8.1/conf)
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
        <name>searcher.dir</name>
        <value>"/usr/nutch-0.8.1/crawl/"</value>
</property>
<property>
        <name>plugin.includes</name>

<value>protocol-file|protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)</value>
</property>

<property>
        
        <name>http.agent.name</name>
        
        <value>Kate</value>
        
        <description>Kate H.
                
        </description>
        
</property>



<property>
        
        <name>http.agent.description</name>
        
        <value>Nutch spiderman</value>
        
        <description> Nutch spiderman
                
        </description>
        
</property>







<property>
        
        <name>http.agent.email</name>
        
        <value>MyEmail</value>
        
        <description>[EMAIL PROTECTED]
                
        </description>
        
</property>

</configuration>


Thanks in advance- 




-- 
View this message in context: 
http://www.nabble.com/Something-very%2C-very-strange....about-how-my-nutch-runs...-please-help%21-tp17840748p17840748.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to