thanks for your answer, how i set up proper URL filters? On Fri, Dec 16, 2011 at 3:42 AM, Markus Jelsma-2 [via Lucene] < [email protected]> wrote:
> You haven't set up proper URL filters. You'd typically have URL filters > that > only pass the protocol's you need. > > On Thursday 15 December 2011 23:48:50 mina wrote: > > > i crawl sites with nutch 1.3. i see this exception in my log when nutch > > crawl my sites: > > > > Malformed URL: '', skipping (java.net.MalformedURLException: no > > protocol: > > at java.net.URL.<init>(URL.java:567) > > at java.net.URL.<init>(URL.java:464) > > at java.net.URL.<init>(URL.java:413) > > at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:247) > > at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:109) > > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > ) > > > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Malformed-URL-skipping-java-net-Malform > > edURLException-tp3590159p3590159.html Sent from the Nutch - User mailing > > list archive at Nabble.com. > > -- > Markus Jelsma - CTO - Openindex > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/Malformed-URL-skipping-java-net-MalformedURLException-tp3590159p3591381.html > To unsubscribe from Malformed URL: '', skipping > (java.net.MalformedURLException, click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3590159&code=dGFoZXJlZ2Fuaml5YXJAZ21haWwuY29tfDM1OTAxNTl8NTgyODE5NjA3> > . > NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://lucene.472066.n3.nabble.com/Malformed-URL-skipping-java-net-MalformedURLException-tp3590159p3591831.html Sent from the Nutch - User mailing list archive at Nabble.com.

