RE: user agent checking and spidering...

2004-04-05 Thread Mark A. Kruger - CFG
? The robot.txt file allows you to exclude pages on THIS site that you don't want indexed. -Mark -Original Message- From: Dave Watts [mailto:[EMAIL PROTECTED] Sent: Sunday, April 04, 2004 5:23 PM To: CF-Talk Subject: RE: user agent checking and spidering... Sequelink (the access service

RE: user agent checking and spidering...

2004-04-05 Thread Mark A. Kruger - CFG
P.S. Actually he had NO caching and that is our first step - and it has been quite successful. -Mark -Original Message- From: Dave Watts [mailto:[EMAIL PROTECTED] Sent: Sunday, April 04, 2004 5:23 PM To: CF-Talk Subject: RE: user agent checking and spidering... Sequelink (the access

Re: user agent checking and spidering...

2004-04-05 Thread Stephen Moretti
Mark A. Kruger - CFG wrote: Dave, That's not what I'm finding.If you have a robots.txt file that says: disallow /search.cfm It will not index the search.cfm file from the root of the server. But I cannot find anywhere where you can put in something like this: disallow

RE: user agent checking and spidering...

2004-04-05 Thread Mark A. Kruger - CFG
STephen, Thanks for the URL.We do have a robots file in the root of each site.Perhaps the meta tags will help.I'll check it out. -Mark -Original Message- From: Stephen Moretti [mailto:[EMAIL PROTECTED] Sent: Monday, April 05, 2004 8:32 AM To: CF-Talk Subject: Re: user agent checking

RE: user agent checking and spidering...

2004-04-05 Thread Dave Watts
That's not what I'm finding.If you have a robots.txt file that says: disallow /search.cfm It will not index the search.cfm file from the root of the server. But I cannot find anywhere where you can put in something like this: disallow http://www.someothersite.com You see what

Re: user agent checking and spidering...

2004-04-05 Thread Jochem van Dieten
Mark A. Kruger - CFG wrote: I have a client with many many similar sites on a single server using CFMX.Each of the sites is part of a network of sites that all link together - about 150 to 200 sites in all.Each home page has links to other sites in the network. Periodically, it appears that

RE: user agent checking and spidering...

2004-04-04 Thread Jim Davis
I'm not sure if it's the best way to do things but I may be able to help with the user agents.Basically what I've done is capture all the user agents to hit my sites over the past few years.I go through periodically and (using a bit column in the table) mark whether the agents are bots or not.

RE: user agent checking and spidering...

2004-04-04 Thread Mark A. Kruger - CFG
Jim, Thanks - that might be a good place to start. Can you send it to my email? thanks! -Mark -Original Message- From: Jim Davis [mailto:[EMAIL PROTECTED] Sent: Sunday, April 04, 2004 1:27 PM To: CF-Talk Subject: RE: user agent checking and spidering... I'm not sure if it's the best

RE: user agent checking and spidering...

2004-04-04 Thread Jim Davis
) - there's really no way to tell if these are homemade bots or homemade browsers. Hope this helps, Jim Davis _ From: Mark A. Kruger - CFG [mailto:[EMAIL PROTECTED] Sent: Sunday, April 04, 2004 2:35 PM To: CF-Talk Subject: RE: user agent checking and spidering... Jim, Thanks - that might

RE: user agent checking and spidering...

2004-04-04 Thread Dave Watts
Sequelink (the access service for Jrun I think) locks up quickly trying to service hundreds of requests at once to the same access file. As a short-term fix, have you considered a more aggressive caching strategy? That might be pretty easy to implement. Each site has a pretty well thought