[Robots] Re: Perl and LWP robots

2002-03-07 Thread Tim Bray
At 11:31 AM 07/03/02 -0800, Nick Arnett wrote: >> * Write it in Perl (or equivalent). > >I suppose it doesn't help with a book on Perl, but I'm re-writing my robots >in Python and I'm very happy with the way it's going. I consider Python to fall under "or equivalent" :) >> * Consider

[Robots] The Black Hole of Ecartis

2002-03-07 Thread Nick Arnett
If you tried to send mail to either the robots or km list here in the last couple of hours, they may have been swallowed by the Black Hole of Ecartis. I was switching from Listar to the latest build of Ecartis (its new name) and got into a small nightmare of symbolic links and permissions. So, i

[Robots] Re: Perl and LWP robots

2002-03-07 Thread B Leong
People write spiders that potentially span all/any hosts to harvest those email addresses for the annoying spam, to see if trademarks are being used illegally, to see if copyrights are being violated, etc. > The replies to my request for advice have been very helpful! I'll pick one > and reply

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Avi Rappoport
At 3:43 PM -0700 3/7/02, Sean M. Burke wrote: >The usefulness of the single-host spiders is pretty obvious to me. >But why do people want to write spiders that potentially span all/any hosts? >(Aside from people who are working for Google or similar.) People think a robot can be an intelligent a

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Sean M. Burke
The replies to my request for advice have been very helpful! I'll pick one and reply to it: At 10:01 2002-03-07 -0800, Otis Gospodnetic wrote: >[about my forthcoming book] >(i.e. I'm a potential customer :)) When will it be published? It's probably going into tech edit later this month. So i

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Nick Arnett
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Tim Bray [snip] > * Write it in Perl (or equivalent). I suppose it doesn't help with a book on Perl, but I'm re-writing my robots in Python and I'm very happy with the way it's going. Perform

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Klaus Johannes Rusch
In <[EMAIL PROTECTED]>, "Sean M. Burke" <[EMAIL PROTECTED]> writes: > Aside from basic concepts (don't hammer the server; always obey the > robots.txt; don't span hosts unless you are really sure that you want to), > are there any particular bits of wisdom that list members would want me to > pa

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Avi Rappoport
I've found that image maps, framesets, redirects, funky relative links, JavaScript links and dynamic URLs generated from backend systems are the main problems with robots. Also bad HTML on pages so the robot gets confused parsing it, such as unclosed tags. I have written up a checklist for

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Matthew Meadows
That's a curious remark about readers and their misplaced desire for recursive spiders. A recursive spider allows its user to drill down into a particular information domain and ultimately exhaust it if the spider is capable enough. This is of enormous benefit to the information researcher look

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Michael Lange
Hi Sean, You might want to consider exploring the "not yet approved" updated robots.txt standard that covers allow rules and how to apply them to your spider. This may help raise the level of awareness on the robots.txt standard. You could also talk about how to use the robots.txt with your spid

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Otis Gospodnetic
Excellent. I have a copy of Wong's book at home and like that topic (i.e. I'm a potential customer :)) When will it be published? I think lots of people do want to know about recursive spiders, and I bet one of the most frequent obstacles are issues like: queueing, depth vs. breadth first crawl

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Chris Skepper
> Aside from basic concepts (don't hammer the server; always obey the > robots.txt; don't span hosts unless you are really sure that you want to), > are there any particular bits of wisdom that list members would want me to > pass on to my readers? Look at http://www.robotstxt.org/wc/guidelin

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Tim Bray
At 02:51 AM 07/03/02 -0700, Sean M. Burke wrote: >Aside from basic concepts (don't hammer the server; always obey the >robots.txt; don't span hosts unless you are really sure that you want to), >are there any particular bits of wisdom that list members would want me to >pass on to my readers?

[Robots] Perl and LWP robots

2002-03-07 Thread Sean M. Burke
Hi all! My name is Sean Burke, and I'm writing a book for O'Reilly, which is to basically replace the Clinton Wong's now out-of-print /Web Client Programming with Perl/. In my book draft so far, I haven't discussed actual recursive spiders (I've only discussed getting a given page, and then