I think you have in your file which is being indexed something like javascript:something this makes nutch think javascript is a protocol and throws a malformed url exception try "javascript: somthing" or you go into the code and ignore the MalformedURLException
at org.apache.nutch.net.BasicUrlNormalizer.normalize(BasicUrlNormalizer.java:78) hth david > Hi, > > I'm sorry but I still don't succeed in indexing all the content of my web > site. > In the log I have some errors : > > 2006-09-25 15:35:42,859 ERROR parse.OutlinkExtractor - getOutlinks > java.net.MalformedURLException: unknown protocol: javascript > at java.net.URL.<init>(URL.java:574) > at java.net.URL.<init>(URL.java:464) > at java.net.URL.<init>(URL.java:413) > at > org.apache.nutch.net.BasicUrlNormalizer.normalize(BasicUrlNormalizer.java:78) > at org.apache.nutch.parse.Outlink.<init>(Outlink.java:35) > at > org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:111) > at > org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:70) > at org.apache.nutch.parse.text.TextParser.getParse(TextParser.java:47) > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82) > at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:276) > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:152) > > I don't clearly understood the configuration I have to make for the agent in > the nutch-site.xml file....... > > Could someone help me......... > > > ----- Message d'origine ---- > De : Aïcha <[EMAIL PROTECTED]> > À : [email protected] > Envoyé le : Mardi, 19 Septembre 2006, 16h16mn 32s > Objet : problem with web site indexing > > > Hi, > > I try to index a web site with all the pages of the site, > but the only page I have in the index is the first page or the page of the > URL I have put in the input file of the crawling..... > at the end I have only one page in the index....... > so do I have to do something to make it work? > > Thanks in advance! > Aïcha > > > > > > > ___________________________________________________________________________ > Découvrez un nouveau moyen de poser toutes vos questions quelque soit le > sujet ! > Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et > vos expériences. > http://fr.answers.yahoo.com > >
