As far as I know, JSParsefilter is an heuristic parser and I haven't read very good things about its performances. Javascript can be really hard to "understand" for any parser, especially (but not only) if minified.
Just out of curiosity, can you post the javascript code you're trying to parse? S ---------------------------------- "Anyone proposing to run Windows on servers should be prepared to explain what they know about servers that Google, Yahoo, and Amazon don't." Paul Graham "A mathematician is a device for turning coffee into theorems." Paul Erdos (who obviously never met a sysadmin) ----- Messaggio originale ----- > Da: eric park <[email protected]> > A: [email protected] > Inviato: Lun 7 giugno 2010, 03:13:06 > Oggetto: javascript crawling > > Hello, I'm trying to crawl a local Intranet using nutch-1.0. The > difficulty comes from crawling bulletin board. The bulletin-board > consists of javascipt code. Nutch must use the JSParsefilter to parse > the javascipt and move up to next b-board page for the contents parsing, but > the crawler doesn't extract the proper link. Anyone had similar > experience? Any help will be appreciated. Thank You!

