The problem at www.gildemeister.com is the use of JavaScript for link
generation.
That's the reason why nutch can't find the other pages (the links are
invisible).
Two ideas:
- You need something like a sitemap, that links the other main pages.
If it's not available
  right now, you should try to generate it (e.g. use the apache log-file)
- Enhance the nutch html parser and make it able to intepret the JavaScipt links

Greetings
mos, from munich



On 2/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I have problems indexing a special internet site:
> http://www.gildemeister.com
>
> Nutch only fetches 14 pages but not the complete site.
>
> I'm using the default parameters and the intranet crawl command.
>
> I get no errors or so. Can someone try to index the site and can send me a
> hint?
> Or an config that works. I am new to nutch and I don't know where I can
> start to fix it.
>
> thanks
>
> wombat
>
>
>

Reply via email to