There is already a java script parser, you only need to switch it on.
Am 03.02.2006 um 15:55 schrieb mos:
The problem at www.gildemeister.com is the use of JavaScript for link
generation.
That's the reason why nutch can't find the other pages (the links are
invisible).
Two ideas:
- You need something like a sitemap, that links the other main pages.
If it's not available
right now, you should try to generate it (e.g. use the apache log-
file)
- Enhance the nutch html parser and make it able to intepret the
JavaScipt links
Greetings
mos, from munich
On 2/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Hello,
I have problems indexing a special internet site:
http://www.gildemeister.com
Nutch only fetches 14 pages but not the complete site.
I'm using the default parameters and the intranet crawl command.
I get no errors or so. Can someone try to index the site and can
send me a
hint?
Or an config that works. I am new to nutch and I don't know where
I can
start to fix it.
thanks
wombat
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general