Sounds interesting.
I'd love to see some screenshots of some community graphs and main
characters in itpossible?
Otis
--- Nick Arnett [EMAIL PROTECTED] wrote:
As long as we're kicking around what's new, here's mine. I've been
working
on a system that finds topical Internet discussions
I think I remember those proposals, actually.
I have never hear anyone mention them anywhere else, so I don't think
anyone has implemented a crawler that looks for those new things in
robots.txt
Otis
--- Sean 'Captain Napalm' Conner [EMAIL PROTECTED] wrote:
Well, I was surprised to recently
LWP? Very popular in a big Perl community.
--- Rasmus Mohr [EMAIL PROTECTED] wrote:
Any idea how widespread the use of this library is? We've observed
some
weird behaviors from some of the major search engines' spiders
(basically
ignoring robots.txt sections) - maybe this is the
I am working on a robot develpoment, in java,.
We are developing a search enginealmost the
complete engine is developed...
We used java for the devlopment...but the performance
of java api in fetching the web pages is too low,
basically we developed out own URL Connection , as
we
Excellent. I have a copy of Wong's book at home and like that topic
(i.e. I'm a potential customer :)) When will it be published?
I think lots of people do want to know about recursive spiders, and I
bet one of the most frequent obstacles are issues like: queueing, depth
vs. breadth first
The above is just for consideration if the robots.txt is ever
updated so the
robots could be informed of this little detail.
There was a push in '96 or '97 to update the robots.txt standard
and I
wrote a proposal back then
(http://www.conman.org/people/spc/robots2.html)
and
Hello,
Yes, everything you said is fine. I just wanted to
write 'custom data structures' and code to handle
large amounts of data by flexibly keeping it either in
RAM or on disk, instead of using a regular RDBMS for
storing that data, like Webbase does.
Otis
--- Corey Schwartz [EMAIL
Hello,
Web 'spiders' act like regular web clients do.
Depending on the spider implementation they may accept
cookies, store them, and send them back to sites that
set them, or they can just completely ignore them.
There is no single answer.
If you do not want spiders to index your sites there
Add Larbin to that list.
--- Krishna N. Jha [EMAIL PROTECTED] wrote:
Look into webBase, pavuk, wget - there are some
other similar free
products out there.
(I am not sure I fully understand/appreciate all
your requirements,
though; if you wish, you can clarify them to me.)
We also have