I think you have in your file which is being indexed something like
javascript:something
this makes nutch think javascript is a protocol and throws a malformed
url exception
try "javascript: somthing"
or you go into the code and ignore the MalformedURLException

at org.apache.nutch.net.BasicUrlNormalizer.normalize(BasicUrlNormalizer.java:78)


hth david

> Hi,
>
> I'm sorry but I still don't succeed in indexing all the content of my web 
> site.
> In the log I have some errors : 
>
> 2006-09-25 15:35:42,859 ERROR parse.OutlinkExtractor - getOutlinks
> java.net.MalformedURLException: unknown protocol: javascript
>  at java.net.URL.<init>(URL.java:574)
>  at java.net.URL.<init>(URL.java:464)
>  at java.net.URL.<init>(URL.java:413)
>  at 
> org.apache.nutch.net.BasicUrlNormalizer.normalize(BasicUrlNormalizer.java:78)
>  at org.apache.nutch.parse.Outlink.<init>(Outlink.java:35)
>  at 
> org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:111)
>  at 
> org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:70)
>  at org.apache.nutch.parse.text.TextParser.getParse(TextParser.java:47)
>  at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82)
>  at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:276)
>  at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:152)
>
> I don't clearly understood the configuration I have to make for the agent in 
> the nutch-site.xml file.......
>
> Could someone help me.........
>
>
> ----- Message d'origine ----
> De : Aïcha <[EMAIL PROTECTED]>
> À : [email protected]
> Envoyé le : Mardi, 19 Septembre 2006, 16h16mn 32s
> Objet : problem with web site indexing
>
>
> Hi,
>
> I try to index a web site with all the pages of the site,
> but the only page I have in the index is the first page or the page of the 
> URL I have put in the input file of the crawling.....
> at the end I have only one page in the index.......
> so do I have to do something to make it work?
>
> Thanks in advance!
> Aïcha
>
>
>       
>
>       
>               
> ___________________________________________________________________________ 
> Découvrez un nouveau moyen de poser toutes vos questions quelque soit le 
> sujet ! 
> Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et 
> vos expériences. 
> http://fr.answers.yahoo.com 
>
>   

Reply via email to