Re: [Robots] New robots.txt rules

2007-12-04 Thread Tim Bray
Thanks, Danny. I was thinking "WTF!?!?" -T On Dec 4, 2007 3:59 AM, Danny Sullivan <[EMAIL PROTECTED]> wrote: > > > > > To be clear, these are not new robots.txt rules. It is a proposed new > standard that none of the major search engines but Exalead supports. This > explains more: http://searc

Re: [Robots] Googlebot, msnbot, and robots.txt refresh

2006-03-26 Thread Tim Bray
On Mar 26, 2006, at 7:25 AM, <[EMAIL PROTECTED]> [EMAIL PROTECTED]> wrote: Hi, Googlebot and msnbot are supposed to obey robots.txt, but they are ignoring my robots.txt ( http://simpy.com/robots.txt ), that contains: Looks like a bug to me -Tim ___

Re: [Robots] Is this mailing linst alive?

2003-11-04 Thread Tim Bray
On Nov 3, 2003, at 11:16 PM, Nick Arnett wrote: [EMAIL PROTECTED] wrote: I've created a robot, www.dead-links.com and i wonder if this list is alive. It is alive, but very, very quiet. Yeah, this robots thing is just a fad, it'll never catch on. -Tim

Re: [Robots] Efficient crawling of mailing list archives?

2003-02-28 Thread Tim Bray
y what you want. -- Cheers, Tim Bray (ongoing fragmented essay: http://www.tbray.org/ongoing/) ___ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots

Re: [Robots] Post

2002-11-08 Thread Tim Bray
Paul Maddox wrote: As an AI programmer specialising in NLP, personally I'd like to see web bots actually 'understanding' the content they review, rather than indexing by brute force. How about the equivalent of Dmoz or Yahoo Directory, but generated by a web spider? Which as a side-effect coul

[Robots] Re: better language for writing a Spider ?

2002-03-15 Thread Tim Bray
Sean M. Burke wrote: > In short, if people want to see improvements to LWP, email me and say what > you want done For robots, you need a call that says "fetch this URL, but get a maximum of XX bytes and spend a maximum of YY seconds doing it." Return status should tell you whether it finishe

[Robots] Re: better language for writing a Spider ?

2002-03-15 Thread Tim Bray
srinivas mohan wrote: > can you help me suggesting any open source compilers > to compile my java code to native code... I suggest that this is unlikely to help. Whenever a computer program is not runnning fast enough, the first step MUST BE to measure it and understand why. Use a profiler.

[Robots] Re: better language for writing a Spider ?

2002-03-14 Thread Tim Bray
At 10:36 AM 14/03/02 -0800, Nick Arnett wrote: > I wish >I could be more specific, but I never did figure out what was really going >on. Following an LWP request through the debugger is a long and convoluted >journey... I totally agree with Nick that when LWP works, it's OK, but when it doesn

[Robots] Re: better language for writing a Spider ?

2002-03-14 Thread Tim Bray
At 09:47 AM 14/03/02 -0800, srinivas mohan wrote: >Now as the performance is low..we wanted to redevelop >our spider..in a language like c or perl...and use >it with our existing product.. > >I will be thankful if any one can help me choosing >the better language..where i can get better >perfor

[Robots] Re: matching and "UserAgent:" in robots.txt

2002-03-14 Thread Tim Bray
Sean M. Burke wrote: > I'm a bit perplexed over whether the current Perl library WWW::RobotRules > implements a certain part of the Robots Exclusion Standard correctly. So > forgive me if this seems a simple question, but my reading of the Robots > Exclusion Standard hasn't really cleared it

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Tim Bray
At 11:31 AM 07/03/02 -0800, Nick Arnett wrote: >> * Write it in Perl (or equivalent). > >I suppose it doesn't help with a book on Perl, but I'm re-writing my robots >in Python and I'm very happy with the way it's going. I consider Python to fall under "or equivalent" :) >> * Consider

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Tim Bray
At 02:51 AM 07/03/02 -0700, Sean M. Burke wrote: >Aside from basic concepts (don't hammer the server; always obey the >robots.txt; don't span hosts unless you are really sure that you want to), >are there any particular bits of wisdom that list members would want me to >pass on to my readers?

Re: Bot2001 conference

2000-12-10 Thread Tim Bray
At 03:34 PM 08/12/00 -0800, you wrote: >Anyone else going to the Bot2001 conference in SF, January 25? The >announcement is at . Reading the program, it seems kinda lame. -Tim

Re: Computing ranks

2000-10-28 Thread Tim Bray
At 12:15 AM 11/09/00 -0400, Charles Bedard wrote: >Hi, > >Can anyone share their ideas and knowledge on how >one would create a relevancy ranking system. There is a huge discipline in CompSci called "IR" for Information Retrieval, that worries a lot about this. Most of the basic techniques were

Re: Bjaaland

2000-10-27 Thread Tim Bray
At 05:06 PM 12/10/00 -0700, you wrote: >I'm just curious about what Bjaaland is 'up to' these >days? Oops... haven't read the mailing list for a couple weeks. The results of bjaaland's labors will become visible at http://map.net on Nov. 14th. -Tim

Re: What happens once robots are barred?

2000-03-08 Thread Tim Bray
At 11:59 AM 3/8/00 -0800, Mark Bennett wrote: >* It should also keep track of "orphan" pages - pages that are still >accessible via the direct URL, but are no longer linked-to by other pages on >the site. > >I believe all 3 classes of pages should be removed from the index. > >The third item is a