from:"Otis Gospodnetic"

RE: [Robots] Post

2002-11-08 Thread Otis Gospodnetic

Sounds interesting. I'd love to see some screenshots of some community graphs and main characters in itpossible? Otis --- Nick Arnett [EMAIL PROTECTED] wrote: As long as we're kicking around what's new, here's mine. I've been working on a system that finds topical Internet discussions

Re: [Robots] Post

2002-11-08 Thread Otis Gospodnetic

I think I remember those proposals, actually. I have never hear anyone mention them anywhere else, so I don't think anyone has implemented a crawler that looks for those new things in robots.txt Otis --- Sean 'Captain Napalm' Conner [EMAIL PROTECTED] wrote: Well, I was surprised to recently

[Robots] Re: SV: matching and UserAgent: in robots.txt

2002-03-14 Thread Otis Gospodnetic

LWP? Very popular in a big Perl community. --- Rasmus Mohr [EMAIL PROTECTED] wrote: Any idea how widespread the use of this library is? We've observed some weird behaviors from some of the major search engines' spiders (basically ignoring robots.txt sections) - maybe this is the

[Robots] Re: better language for writing a Spider ?

2002-03-14 Thread Otis Gospodnetic

I am working on a robot develpoment, in java,. We are developing a search enginealmost the complete engine is developed... We used java for the devlopment...but the performance of java api in fetching the web pages is too low, basically we developed out own URL Connection , as we

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Otis Gospodnetic

Excellent. I have a copy of Wong's book at home and like that topic (i.e. I'm a potential customer :)) When will it be published? I think lots of people do want to know about recursive spiders, and I bet one of the most frequent obstacles are issues like: queueing, depth vs. breadth first

[Robots] Re: Correct URL, shlash at the end ?

2001-11-22 Thread Otis Gospodnetic

The above is just for consideration if the robots.txt is ever updated so the robots could be informed of this little detail. There was a push in '96 or '97 to update the robots.txt standard and I wrote a proposal back then (http://www.conman.org/people/spc/robots2.html) and

[Robots] Re: Data structures for crawlers?

2001-06-27 Thread Otis Gospodnetic

Hello, Yes, everything you said is fine. I just wanted to write 'custom data structures' and code to handle large amounts of data by flexibly keeping it either in RAM or on disk, instead of using a regular RDBMS for storing that data, like Webbase does. Otis --- Corey Schwartz [EMAIL

[Robots] Re: Search Engine Spiders and Cookies

2001-06-17 Thread Otis Gospodnetic

Hello, Web 'spiders' act like regular web clients do. Depending on the spider implementation they may accept cookies, store them, and send them back to sites that set them, or they can just completely ignore them. There is no single answer. If you do not want spiders to index your sites there

Re: Looking for a gatherer.

2001-01-10 Thread Otis Gospodnetic

Add Larbin to that list. --- Krishna N. Jha [EMAIL PROTECTED] wrote: Look into webBase, pavuk, wget - there are some other similar free products out there. (I am not sure I fully understand/appreciate all your requirements, though; if you wish, you can clarify them to me.) We also have

RE: [Robots] Post

Re: [Robots] Post

[Robots] Re: SV: matching and UserAgent: in robots.txt

[Robots] Re: better language for writing a Spider ?

[Robots] Re: Perl and LWP robots

[Robots] Re: Correct URL, shlash at the end ?

[Robots] Re: Data structures for crawlers?

[Robots] Re: Search Engine Spiders and Cookies

Re: Looking for a gatherer.

9 matches

Site Navigation

Mail list logo

Footer information