The problem at www.gildemeister.com is the use of JavaScript for link generation. That's the reason why nutch can't find the other pages (the links are invisible). Two ideas: - You need something like a sitemap, that links the other main pages. If it's not available right now, you should try to generate it (e.g. use the apache log-file) - Enhance the nutch html parser and make it able to intepret the JavaScipt links
Greetings mos, from munich On 2/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hello, > > I have problems indexing a special internet site: > http://www.gildemeister.com > > Nutch only fetches 14 pages but not the complete site. > > I'm using the default parameters and the intranet crawl command. > > I get no errors or so. Can someone try to index the site and can send me a > hint? > Or an config that works. I am new to nutch and I don't know where I can > start to fix it. > > thanks > > wombat > > >
