Re: javascript crawling

Stefano Cherchi Mon, 07 Jun 2010 04:26:51 -0700

As far as I know, JSParsefilter is an heuristic parser and I haven't read very 
good things about its performances. Javascript can be really hard to 
"understand" for any parser, especially (but not only) if minified.


Just out of curiosity, can you post the javascript code you're trying to parse?

S

---------------------------------- 
"Anyone proposing to run Windows on servers should be prepared to explain 
what they know about servers that Google, Yahoo, and Amazon don't."
Paul Graham


"A mathematician is a device for turning coffee into theorems."
Paul Erdos (who obviously never met a sysadmin)



----- Messaggio originale -----
> Da: eric park <[email protected]>
> A: [email protected]
> Inviato: Lun 7 giugno 2010, 03:13:06
> Oggetto: javascript crawling
> 
> Hello,
I'm trying to crawl a local Intranet using nutch-1.0.  The 
> difficulty comes
from crawling bulletin board.  The bulletin-board 
> consists of javascipt
code.  Nutch must use the JSParsefilter to parse 
> the javascipt and move up
to next b-board page for the contents parsing, but 
> the crawler doesn't
extract the proper link.  Anyone had similar 
> experience? Any help will be
appreciated.

Thank You!

Re: javascript crawling

Reply via email to