You haven't set up proper URL filters. You'd typically have URL filters that 
only pass the protocol's you need.

On Thursday 15 December 2011 23:48:50 mina wrote:
> i crawl sites with nutch 1.3. i see this exception in my log when nutch
> crawl my sites:
> 
>     Malformed URL: '', skipping (java.net.MalformedURLException: no
> protocol:
>       at java.net.URL.<init>(URL.java:567)
>       at java.net.URL.<init>(URL.java:464)
>       at java.net.URL.<init>(URL.java:413)
>       at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:247)
>       at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:109)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>       at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) )
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Malformed-URL-skipping-java-net-Malform
> edURLException-tp3590159p3590159.html Sent from the Nutch - User mailing
> list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex

Reply via email to