Howie Wang wrote:
I think you have to hack the parse-html plugin. Look in
DOMContentUtils.java
in getOutlinks.java. You'll probably have to look for targets that
start with
"javascript:" and do some string replacing.
The latest SVN version already has a JavaScript link extractor
(JSParseFilter in parse-js plugin). Currently it handles extraction of
JS snippets from HTML events (onload, onclick, onmouseover, etc), and of
course from <script> elements. The only thing missing to handle your
case is to add a clause to handle the "javascript:" in any other attribute.
I can make this change. Watch the commit messages so that you know when
to sync your source.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com