thanks for your answer, how i set up proper URL filters?

On Fri, Dec 16, 2011 at 3:42 AM, Markus Jelsma-2 [via Lucene] <
[email protected]> wrote:

> You haven't set up proper URL filters. You'd typically have URL filters
> that
> only pass the protocol's you need.
>
> On Thursday 15 December 2011 23:48:50 mina wrote:
>
> > i crawl sites with nutch 1.3. i see this exception in my log when nutch
> > crawl my sites:
> >
> >     Malformed URL: '', skipping (java.net.MalformedURLException: no
> > protocol:
> > at java.net.URL.<init>(URL.java:567)
> > at java.net.URL.<init>(URL.java:464)
> > at java.net.URL.<init>(URL.java:413)
> > at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:247)
> > at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:109)
> > at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> )
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Malformed-URL-skipping-java-net-Malform
> > edURLException-tp3590159p3590159.html Sent from the Nutch - User mailing
> > list archive at Nabble.com.
>
> --
> Markus Jelsma - CTO - Openindex
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Malformed-URL-skipping-java-net-MalformedURLException-tp3590159p3591381.html
>  To unsubscribe from Malformed URL: '', skipping
> (java.net.MalformedURLException, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3590159&code=dGFoZXJlZ2Fuaml5YXJAZ21haWwuY29tfDM1OTAxNTl8NTgyODE5NjA3>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Malformed-URL-skipping-java-net-MalformedURLException-tp3590159p3591831.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to