Hi, I started Nutch on my localhost web site. In application I have javascript files that create dynamic urls. My question is: what should I configure so that Nutch recognizes these urls and completely crawls the site?
Below is part of log file that nutch generates. fetching https://www.localhost/script/ShockwaveFlash.ShockwaveFlash. fetching https://www.localhost/script/ fetching https://www.localhost/script/webtv/2.6 fetching https://www.localhost/script/_level0/_root fetching https://www.localhost/script/betslip.aspx fetching https://www.localhost/shared/script/+s_c2fe(c.substring(o+1,e))+ fetching https://www.localhost/shared/script/)<0)||oc.indexOf( fetching https://www.localhost/shared/script/+m).indexOf( fetching https://www.localhost/shared/script/c.indexOf(\ fetching https://www.localhost/shared/script/);else{if(s.ismac&&s.u.indexOf( fetching https://www.localhost/registration.aspx# fetch of https://www.localhost/shared/script/)<0)||oc.indexOf( failed with: java.lang.IllegalArgumentException: Invalid uri 'https://www.localhost/shared/script/)<0)||oc.indexOf(': escaped absolute path not valid Thanks. Stjepan ____________________________________________________________________________________ TV dinner still cooling? Check out "Tonight's Picks" on Yahoo! TV. http://tv.yahoo.com/ ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
