Re: HTML::Parser plaintext tag

2004-11-12 Thread Alex Kapranoff
* Alex Kapranoff [EMAIL PROTECTED] [November 11 2004, 11:11]: It results in weird effects for me as I write a HTML sanitizer for WebMail. Howcome? Do you have a need to suppress this behaviour in HTML::Parser? Yes, I'd like to have an option to resume parsing after `/plaintext' just

Re: [PATCH] Caching/reusing WWW::RobotRules(::InCore)

2004-11-12 Thread Gisle Aas
Ville Skyttä [EMAIL PROTECTED] writes: The current behaviour of LWP::RobotUA, when passed in an existing WWW::RobotRules::InCore object is counterintuitive to me. I am of this opinion because of the documentation of $rules in LWP::RobotUA-new() and WWW::RobotRules-agent(), as well as the

Re: Patch for WWW::RobotsRules.pm

2004-11-12 Thread Gisle Aas
Bill Moseley [EMAIL PROTECTED] writes: I've got a spider that uses LWP::RobotUA (WWW::RobotRules) and a few users of the spider have complained that the warning messages were not obvious enough. I guess I can agree because when they are spidering multiple hosts the message doesn't tell them

Re: WWW::RobotRules warning could be more helpful

2004-11-12 Thread Gisle Aas
[EMAIL PROTECTED] writes: If you spider several sites and one of them has a broken robots.txt file you can't tell which one since the warning doesn't tell you. This will be better in 5.801. I've applied a variation of Bill Moseley's suggested patch for the same problem. Around line 73 of