Hi Andrzej,

Well, I think for now I'll just disable the parse-js plugin since I
don't really need it anyway.
I'll let you know if I ever work on it (I may need it in the future).

Thanks,
--Flo

Andrzej Bialecki wrote:

> Florent Gluck wrote:
>
>> Some urls are totally bogus.  I didn't investigate what could be causing
>> this yet, but it looks like it could be a parsing issue.  Some urls
>> contain some javascript code and others contain some html tags.
>>   
>
>
> This is a side-effect of our primitive parse-js, which doesn't really
> parse anything, just uses some heuristic to extract possible URLs.
> Unfortunately, often as not the strings it extracts don't have
> anything to do with URLs.
>
> If you have suggestions on how to improve it I'm all ears.
>



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to