frgrfg gfsdgffsd wrote:
Hi all,
I have a problem with the crawl/fetch of 1 website (www.lequipe.fr), although
it works for fine another (www.lemonde.fr).
Here are the errors:
ERROR [MAT] 2006-11-22 00:36:20,860 - Http.invoke0(?) |
java.lang.IllegalArgumentException: null metadata
ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at
org.apache.nutch.protocol.Content.<init>(Content.java:60)
ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:196)
ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:162)
Don't understand why metadata is null when there are some metadata on the pages...
what version of nutch are you running?
I also have this messsage just before:
INFO [MAT] 2006-11-22 00:36:32,477 - HttpBase.getProtocolOutput(194) |
Skipping: http://www.lequipe.fr/ exceeds fetcher.max.crawl.delay, max=30,
Crawl-Delay=120
and i can't find this property in nutch-site.xml
You need to add it there.
<property>
<name>fetcher.max.crawl.delay</name>
<value> your value here </value>
</property>
--
Sami Siren