[Robots] Re: Correct URL, shlash at the end ?

2001-11-22 Thread Otis Gospodnetic
The above is just for consideration if the robots.txt is ever updated so the robots could be informed of this little detail. There was a push in '96 or '97 to update the robots.txt standard and I wrote a proposal back then (http://www.conman.org/people/spc/robots2.html) and

[Robots] Re: Data structures for crawlers?

2001-06-27 Thread Otis Gospodnetic
Hello, Yes, everything you said is fine. I just wanted to write 'custom data structures' and code to handle large amounts of data by flexibly keeping it either in RAM or on disk, instead of using a regular RDBMS for storing that data, like Webbase does. Otis --- Corey Schwartz [EMAIL

[Robots] Re: Search Engine Spiders and Cookies

2001-06-17 Thread Otis Gospodnetic
Hello, Web 'spiders' act like regular web clients do. Depending on the spider implementation they may accept cookies, store them, and send them back to sites that set them, or they can just completely ignore them. There is no single answer. If you do not want spiders to index your sites there

Re: Looking for a gatherer.

2001-01-10 Thread Otis Gospodnetic
Add Larbin to that list. --- Krishna N. Jha [EMAIL PROTECTED] wrote: Look into webBase, pavuk, wget - there are some other similar free products out there. (I am not sure I fully understand/appreciate all your requirements, though; if you wish, you can clarify them to me.) We also have

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Otis Gospodnetic
Excellent. I have a copy of Wong's book at home and like that topic (i.e. I'm a potential customer :)) When will it be published? I think lots of people do want to know about recursive spiders, and I bet one of the most frequent obstacles are issues like: queueing, depth vs. breadth first

[Robots] Re: SV: matching and UserAgent: in robots.txt

2002-03-14 Thread Otis Gospodnetic
LWP? Very popular in a big Perl community. --- Rasmus Mohr [EMAIL PROTECTED] wrote: Any idea how widespread the use of this library is? We've observed some weird behaviors from some of the major search engines' spiders (basically ignoring robots.txt sections) - maybe this is the

[Robots] Re: better language for writing a Spider ?

2002-03-14 Thread Otis Gospodnetic
I am working on a robot develpoment, in java,. We are developing a search enginealmost the complete engine is developed... We used java for the devlopment...but the performance of java api in fetching the web pages is too low, basically we developed out own URL Connection , as we

RE: [Robots] Post

2002-11-08 Thread Otis Gospodnetic
Sounds interesting. I'd love to see some screenshots of some community graphs and main characters in itpossible? Otis --- Nick Arnett [EMAIL PROTECTED] wrote: As long as we're kicking around what's new, here's mine. I've been working on a system that finds topical Internet discussions

Re: [Robots] Post

2002-11-08 Thread Otis Gospodnetic
I think I remember those proposals, actually. I have never hear anyone mention them anywhere else, so I don't think anyone has implemented a crawler that looks for those new things in robots.txt Otis --- Sean 'Captain Napalm' Conner [EMAIL PROTECTED] wrote: Well, I was surprised to recently