Hi Andrzej. In my previous projects, I bound javascript functions with center url. And I knew the idea does not fit for nutch.
I am not familiar with Rhino engine. But it is said jdk 6 adopted it as embeded javascript engine. Can we build one RhinoInterpreter first, and then evaluate the javascipt function to get the result rather than extracting pure text now. You can find javadoc about Rhino here: http://xmlgraphics.apache.org/batik/javadoc/index.html Regards /Jack On 3/14/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Florent Gluck wrote: > > Some urls are totally bogus. I didn't investigate what could be causing > > this yet, but it looks like it could be a parsing issue. Some urls > > contain some javascript code and others contain some html tags. > > > > This is a side-effect of our primitive parse-js, which doesn't really > parse anything, just uses some heuristic to extract possible URLs. > Unfortunately, often as not the strings it extracts don't have anything > to do with URLs. > > If you have suggestions on how to improve it I'm all ears. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > -- Keep Discovering ... ... http://www.jroller.com/page/jmars ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
