There is already a java script parser, you only need to switch it on.
Am 03.02.2006 um 15:55 schrieb mos:
The problem at www.gildemeister.com is the use of JavaScript for link
generation.
That's the reason why nutch can't find the other pages (the links are
invisible).
Two ideas:
- You need something like a sitemap, that links the other main pages.
If it's not available
right now, you should try to generate it (e.g. use the apache log-
file)
- Enhance the nutch html parser and make it able to intepret the
JavaScipt links
Greetings
mos, from munich
On 2/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Hello,
I have problems indexing a special internet site:
http://www.gildemeister.com
Nutch only fetches 14 pages but not the complete site.
I'm using the default parameters and the intranet crawl command.
I get no errors or so. Can someone try to index the site and can
send me a
hint?
Or an config that works. I am new to nutch and I don't know where
I can
start to fix it.
thanks
wombat