frgrfg gfsdgffsd wrote:
Hi all,

I have  a problem with the crawl/fetch of 1 website (www.lequipe.fr), although 
it works for fine another (www.lemonde.fr).

Here are the errors:
ERROR [MAT] 2006-11-22 00:36:20,860 - Http.invoke0(?) | 
java.lang.IllegalArgumentException: null metadata
ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at 
org.apache.nutch.protocol.Content.<init>(Content.java:60)
ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at 
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:196)
ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:162)

Don't understand why metadata is null when there are some metadata on the pages...

what version of nutch are you running?


I also have this messsage just before:
INFO [MAT] 2006-11-22 00:36:32,477 - HttpBase.getProtocolOutput(194) | 
Skipping: http://www.lequipe.fr/ exceeds fetcher.max.crawl.delay, max=30, 
Crawl-Delay=120

and i can't find this property in nutch-site.xml

You need to add it there.

<property>
 <name>fetcher.max.crawl.delay</name>
 <value>  your value here  </value>
</property>

--
 Sami Siren

Reply via email to