[EMAIL PROTECTED] wrote:
I have seen all kinds of html attributes altered by cyberneko html parser.
"Not extracting outlinks" may be caused by attribute 'href=' being changed.
This may have to do with its ability to "fix up" html markups.
Interestingly it does not happen in single thread run.

That's no good. It sounds like maybe the cyberneko stuff is not completely thread safe. Can someone look into this more? Thanks.


Doug


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to