There is already a java script parser, you only need to switch it on.

Am 03.02.2006 um 15:55 schrieb mos:

The problem at www.gildemeister.com is the use of JavaScript for link
generation.
That's the reason why nutch can't find the other pages (the links are
invisible).
Two ideas:
- You need something like a sitemap, that links the other main pages.
If it's not available
right now, you should try to generate it (e.g. use the apache log- file) - Enhance the nutch html parser and make it able to intepret the JavaScipt links

Greetings
mos, from munich



On 2/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Hello,

I have problems indexing a special internet site:
http://www.gildemeister.com

Nutch only fetches 14 pages but not the complete site.

I'm using the default parameters and the intranet crawl command.

I get no errors or so. Can someone try to index the site and can send me a
hint?
Or an config that works. I am new to nutch and I don't know where I can
start to fix it.

thanks

wombat





Reply via email to