Hi, I'm sorry but I still don't succeed in indexing all the content of my web site. In the log I have some errors :
2006-09-25 15:35:42,859 ERROR parse.OutlinkExtractor - getOutlinks java.net.MalformedURLException: unknown protocol: javascript at java.net.URL.<init>(URL.java:574) at java.net.URL.<init>(URL.java:464) at java.net.URL.<init>(URL.java:413) at org.apache.nutch.net.BasicUrlNormalizer.normalize(BasicUrlNormalizer.java:78) at org.apache.nutch.parse.Outlink.<init>(Outlink.java:35) at org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:111) at org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:70) at org.apache.nutch.parse.text.TextParser.getParse(TextParser.java:47) at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:276) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:152) I don't clearly understood the configuration I have to make for the agent in the nutch-site.xml file....... Could someone help me......... ----- Message d'origine ---- De : Aïcha <[EMAIL PROTECTED]> À : [email protected] Envoyé le : Mardi, 19 Septembre 2006, 16h16mn 32s Objet : problem with web site indexing Hi, I try to index a web site with all the pages of the site, but the only page I have in the index is the first page or the page of the URL I have put in the input file of the crawling..... at the end I have only one page in the index....... so do I have to do something to make it work? Thanks in advance! Aïcha ___________________________________________________________________________ Découvrez un nouveau moyen de poser toutes vos questions quelque soit le sujet ! Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et vos expériences. http://fr.answers.yahoo.com
