Hi Andrzej, Well, I think for now I'll just disable the parse-js plugin since I don't really need it anyway. I'll let you know if I ever work on it (I may need it in the future).
Thanks, --Flo Andrzej Bialecki wrote: > Florent Gluck wrote: > >> Some urls are totally bogus. I didn't investigate what could be causing >> this yet, but it looks like it could be a parsing issue. Some urls >> contain some javascript code and others contain some html tags. >> > > > This is a side-effect of our primitive parse-js, which doesn't really > parse anything, just uses some heuristic to extract possible URLs. > Unfortunately, often as not the strings it extracts don't have > anything to do with URLs. > > If you have suggestions on how to improve it I'm all ears. > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
