I may have to say that in the html source code it is a relative url like (/cfp/call?conference=artificial%20intelligence&page=2)
Regards, MyD MyD wrote: > > Hi @ all, > > I'd like to run an intranet crawl with my own plugin on the domain > www.wikicfp.com. > (http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&skip=1) > > The problem is that nutch doesn't find the important urls, so nutch can't > crawl further... > (http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page=2) > (http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page=3) > (http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page=4) > (http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page= > ....) > > Any suggestions? > > nutch-site.xml > > <property> > <name>plugin.includes</name> > <value>my-plugin|protocol-http|parse-(html|js)|index-basic</value> > <description> > </description> > </property> > > I commented all urlfilter files (regex etc..) in conf/. > > Thanks in advance. > > Regards, > MyD > > -- View this message in context: http://www.nabble.com/Nutch-doesn%27t-find-all-urls..-Any-suggestion--tp22599690p22599904.html Sent from the Nutch - User mailing list archive at Nabble.com.