You haven't set up proper URL filters. You'd typically have URL filters that only pass the protocol's you need.
On Thursday 15 December 2011 23:48:50 mina wrote: > i crawl sites with nutch 1.3. i see this exception in my log when nutch > crawl my sites: > > Malformed URL: '', skipping (java.net.MalformedURLException: no > protocol: > at java.net.URL.<init>(URL.java:567) > at java.net.URL.<init>(URL.java:464) > at java.net.URL.<init>(URL.java:413) > at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:247) > at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:109) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) ) > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Malformed-URL-skipping-java-net-Malform > edURLException-tp3590159p3590159.html Sent from the Nutch - User mailing > list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex

