Re: HTTP_USER_AGENT question
It notes the more common bots (google) and assigns them a timeout of 2 seconds. I'm under the impression that bots (or just the Google bot?) don't do sessions at all, no? On Fri, Feb 20, 2009 at 1:44 AM, Michael Dinowitz mdino...@houseoffusion.com wrote: I use this in my application.cfc right at top. It notes the more common bots (google) and assigns them a timeout of 2 seconds. You can use the same logic for whatever you want. IF (REFindNoCase('Slurp|Google|BecomeBot|msnbot|ZyBorg|RufusBot|EMonitor|java', cgi.http_user_agent)) This.sessionTimeout=createtimespan(0,0,0,2); On Thu, Feb 19, 2009 at 10:22 PM, Les Mizzell lesm...@bellsouth.net wrote: In working out my Googlebot problem, I came across an idea of using HTTP_USER_AGENT to identify a bot, and then exclude it from an area or whatever (redirect it to www.disney.com or something... that's a joke) ... assuming it gets in despite the robots.txt file. So, this looks like a good starting point to modify from: http://www.bennadel.com/blog/1083-ColdFusion-Session-Management-And-Spiders-Bots.htm I see how to deal with bots you can identify. Can anybody think of a way or have an example of how to figure out that it's a unidentifiable bot (rather than a real user with a browser) and redirect/whatever? Just asking ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319597 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: HTTP_USER_AGENT question
They do, but they are usually one shot sessions. Basically, most bots do not keep state. When it hits one page and gets a session, the next page it hits is treated as if it was a new visit. In other words, a new session. The amount of bots that keep state and have a single session were rare but are becoming more prevalent. When I write bots, I always write them with state management. Bottom line is that if you have session turned on, every visitor will have a session. On Fri, Feb 20, 2009 at 7:14 AM, John M Bliss bliss.j...@gmail.com wrote: It notes the more common bots (google) and assigns them a timeout of 2 seconds. I'm under the impression that bots (or just the Google bot?) don't do sessions at all, no? On Fri, Feb 20, 2009 at 1:44 AM, Michael Dinowitz mdino...@houseoffusion.com wrote: I use this in my application.cfc right at top. It notes the more common bots (google) and assigns them a timeout of 2 seconds. You can use the same logic for whatever you want. IF (REFindNoCase('Slurp|Google|BecomeBot|msnbot|ZyBorg|RufusBot|EMonitor|java', cgi.http_user_agent)) This.sessionTimeout=createtimespan(0,0,0,2); On Thu, Feb 19, 2009 at 10:22 PM, Les Mizzell lesm...@bellsouth.net wrote: In working out my Googlebot problem, I came across an idea of using HTTP_USER_AGENT to identify a bot, and then exclude it from an area or whatever (redirect it to www.disney.com or something... that's a joke) ... assuming it gets in despite the robots.txt file. So, this looks like a good starting point to modify from: http://www.bennadel.com/blog/1083-ColdFusion-Session-Management-And-Spiders-Bots.htm I see how to deal with bots you can identify. Can anybody think of a way or have an example of how to figure out that it's a unidentifiable bot (rather than a real user with a browser) and redirect/whatever? Just asking ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319598 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: HTTP_USER_AGENT question
Bottom line is that if you have session turned on, every visitor will have a session. But not until you set a session var, right? On Fri, Feb 20, 2009 at 6:30 AM, Michael Dinowitz mdino...@houseoffusion.com wrote: They do, but they are usually one shot sessions. Basically, most bots do not keep state. When it hits one page and gets a session, the next page it hits is treated as if it was a new visit. In other words, a new session. The amount of bots that keep state and have a single session were rare but are becoming more prevalent. When I write bots, I always write them with state management. Bottom line is that if you have session turned on, every visitor will have a session. On Fri, Feb 20, 2009 at 7:14 AM, John M Bliss bliss.j...@gmail.com wrote: It notes the more common bots (google) and assigns them a timeout of 2 seconds. I'm under the impression that bots (or just the Google bot?) don't do sessions at all, no? On Fri, Feb 20, 2009 at 1:44 AM, Michael Dinowitz mdino...@houseoffusion.com wrote: I use this in my application.cfc right at top. It notes the more common bots (google) and assigns them a timeout of 2 seconds. You can use the same logic for whatever you want. IF (REFindNoCase('Slurp|Google|BecomeBot|msnbot|ZyBorg|RufusBot|EMonitor|java', cgi.http_user_agent)) This.sessionTimeout=createtimespan(0,0,0,2); On Thu, Feb 19, 2009 at 10:22 PM, Les Mizzell lesm...@bellsouth.net wrote: In working out my Googlebot problem, I came across an idea of using HTTP_USER_AGENT to identify a bot, and then exclude it from an area or whatever (redirect it to www.disney.com or something... that's a joke) ... assuming it gets in despite the robots.txt file. So, this looks like a good starting point to modify from: http://www.bennadel.com/blog/1083-ColdFusion-Session-Management-And-Spiders-Bots.htm I see how to deal with bots you can identify. Can anybody think of a way or have an example of how to figure out that it's a unidentifiable bot (rather than a real user with a browser) and redirect/whatever? Just asking ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319599 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: HTTP_USER_AGENT question
Bottom line is that if you have session turned on, every visitor will have a session. But not until you set a session var, right? No, the session will exist if you've enabled session management, regardless of whether you create any session variables yourself. Dave Watts, CTO, Fig Leaf Software http://www.figleaf.com/ Fig Leaf Software provides the highest caliber vendor-authorized instruction at our training centers in Washington DC, Atlanta, Chicago, Baltimore, Northern Virginia, or on-site at your location. Visit http://training.figleaf.com/ for more information! ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319600 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: HTTP_USER_AGENT question
No, just running a cfapplication tag or Application.cfc with a session enabled creates the session (CF has to create the SESSION scope to store the session ID). mxAjax / CFAjax docs and other useful articles: http://www.bifrost.com.au/blog/ 2009/2/20 John M Bliss bliss.j...@gmail.com: Bottom line is that if you have session turned on, every visitor will have a session. But not until you set a session var, right? ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319601 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: HTTP_USER_AGENT question
It notes the more common bots (google) and assigns them a timeout of 2 seconds. I'm under the impression that bots (or just the Google bot?) don't do sessions at all, no? Most crawlers do not return cookies, and since session management generally depends on cookies, each request from the same crawler will create a new session. But if the sessions created for that crawler all expire in two seconds, those sessions won't consume a significant amount of memory. Dave Watts, CTO, Fig Leaf Software http://www.figleaf.com/ Fig Leaf Software provides the highest caliber vendor-authorized instruction at our training centers in Washington DC, Atlanta, Chicago, Baltimore, Northern Virginia, or on-site at your location. Visit http://training.figleaf.com/ for more information! ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319602 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
RE: HTTP_USER_AGENT question
-Original Message- From: Dave Watts [mailto:dwa...@figleaf.com] Sent: 20 February 2009 12:48 To: cf-talk Subject: Re: HTTP_USER_AGENT question Bottom line is that if you have session turned on, every visitor will have a session. But not until you set a session var, right? No, the session will exist if you've enabled session management, regardless of whether you create any session variables yourself. Dave Watts, CTO, Fig Leaf Software http://www.figleaf.com/ Note that on Railo this isn't the case: If an application/session scope is defined with the tag CFAPPLICATION ... or the application.cfc, in Railo it will not automatically exist. Only when the scope is used for the first time it will be created. If the scope is not used, it won't be created either. From: http://www.railo-technologies.com/en/index.cfm?treeID=185 Adrian ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319603 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: HTTP_USER_AGENT question
I learn something every day. Thanks, ya'll. On Fri, Feb 20, 2009 at 6:49 AM, Dave Watts dwa...@figleaf.com wrote: It notes the more common bots (google) and assigns them a timeout of 2 seconds. I'm under the impression that bots (or just the Google bot?) don't do sessions at all, no? Most crawlers do not return cookies, and since session management generally depends on cookies, each request from the same crawler will create a new session. But if the sessions created for that crawler all expire in two seconds, those sessions won't consume a significant amount of memory. Dave Watts, CTO, Fig Leaf Software http://www.figleaf.com/ Fig Leaf Software provides the highest caliber vendor-authorized instruction at our training centers in Washington DC, Atlanta, Chicago, Baltimore, Northern Virginia, or on-site at your location. Visit http://training.figleaf.com/ for more information! ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319604 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: HTTP_USER_AGENT question
Interesting. That seems like a more server-memory-friendly approach. Wonder why Adobe CF does not do it that way... On Fri, Feb 20, 2009 at 7:12 AM, Adrian Lynch cont...@adrianlynch.co.ukwrote: -Original Message- From: Dave Watts [mailto:dwa...@figleaf.com] Sent: 20 February 2009 12:48 To: cf-talk Subject: Re: HTTP_USER_AGENT question Bottom line is that if you have session turned on, every visitor will have a session. But not until you set a session var, right? No, the session will exist if you've enabled session management, regardless of whether you create any session variables yourself. Dave Watts, CTO, Fig Leaf Software http://www.figleaf.com/ Note that on Railo this isn't the case: If an application/session scope is defined with the tag CFAPPLICATION ... or the application.cfc, in Railo it will not automatically exist. Only when the scope is used for the first time it will be created. If the scope is not used, it won't be created either. From: http://www.railo-technologies.com/en/index.cfm?treeID=185 Adrian ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319605 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: HTTP_USER_AGENT question
I can't think of many sites I've built that had session management turned on but didn't use session variables somewhere on every page request, so the point would largely be moot since sessions were guaranteed to be used. If I wasn't using them I probably would have turned them off in CF Admin in the first place. ~Brad - Original Message - From: John M Bliss bliss.j...@gmail.com To: cf-talk cf-talk@houseoffusion.com Sent: Friday, February 20, 2009 7:20 AM Subject: Re: HTTP_USER_AGENT question Interesting. That seems like a more server-memory-friendly approach. Wonder why Adobe CF does not do it that way... ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319611 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: HTTP_USER_AGENT question
You've not built many sites with a public-facing, no-sessions-needed front-end and some admin and/or members-only interfaces that require auth? That describes 2/3 of the sites I've built... On Fri, Feb 20, 2009 at 9:45 AM, Brad Wood b...@bradwood.com wrote: I can't think of many sites I've built that had session management turned on but didn't use session variables somewhere on every page request, so the point would largely be moot since sessions were guaranteed to be used. If I wasn't using them I probably would have turned them off in CF Admin in the first place. ~Brad - Original Message - From: John M Bliss bliss.j...@gmail.com To: cf-talk cf-talk@houseoffusion.com Sent: Friday, February 20, 2009 7:20 AM Subject: Re: HTTP_USER_AGENT question Interesting. That seems like a more server-memory-friendly approach. Wonder why Adobe CF does not do it that way... ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319613 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: HTTP_USER_AGENT question
I came across an idea of using HTTP_USER_AGENT to identify a bot ... assuming it gets in despite the robots.txt file. Not a very good idea. HTTP_USER_AGENT will help you identify ONLY good bots that actually comply with the robots.txt file anyway. Any bad bot with some ill intent will be smart enough to forge a browser agent so that you will not identfy it as a bot. Only some smart behavior analysis can detect bots with an acceptable accuracy. - rate of HTTP requests, - reads images (most good bot don't need to read images) - executes Javascript? - etc... ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319616 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: HTTP_USER_AGENT question
the session will exist if you've enabled session management, regardless of whether you create any session variables yourself. Exact: would it be for the session ID. ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319617 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: HTTP_USER_AGENT question
I'm under the impression that bots (or just the Google bot?) don't do sessions at all, no? No bot nor browser do session. Your application makes them. The problem with bots is that they do not keep session ids in cookies, the your application is creating a new session for every page, which can clog your server memory rapidly. The trick is to create very short sessions for bots so that they disappear rapidly. My scheme is just a bit different: I set a short session timeout by default, then comes my agent analyzer, and IF the visitor is assumed human, then I extend the session timeout to 2 hours. ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319618 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: HTTP_USER_AGENT question
In those cases, I give the separate admin/members only it's own cfapplication tag. Disable session management for the public app and enable if for the private one. ~Brad - Original Message - From: John M Bliss bliss.j...@gmail.com To: cf-talk cf-talk@houseoffusion.com Sent: Friday, February 20, 2009 9:51 AM Subject: Re: HTTP_USER_AGENT question You've not built many sites with a public-facing, no-sessions-needed front-end and some admin and/or members-only interfaces that require auth? That describes 2/3 of the sites I've built... ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319624 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: HTTP_USER_AGENT question
It's trivial to fake this header and many bad bots (i.e. the ones that ignore robots.txt) will pretend to be IE or another browser. Claude S has posted his solutions in the past and that should all be in the archives. mxAjax / CFAjax docs and other useful articles: http://www.bifrost.com.au/blog/ 2009/2/20 Les Mizzell lesm...@bellsouth.net: In working out my Googlebot problem, I came across an idea of using HTTP_USER_AGENT to identify a bot, and then exclude it from an area or whatever (redirect it to www.disney.com or something... that's a joke) ... assuming it gets in despite the robots.txt file. So, this looks like a good starting point to modify from: http://www.bennadel.com/blog/1083-ColdFusion-Session-Management-And-Spiders-Bots.htm I see how to deal with bots you can identify. Can anybody think of a way or have an example of how to figure out that it's a unidentifiable bot (rather than a real user with a browser) and redirect/whatever? Just asking ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319585 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
RE: HTTP_USER_AGENT question
Yeah, as a general rule-- never base security off anything in the cgi scope. Anything that comes in the request header can be spoofed. ~Brad Original Message Subject: Re: HTTP_USER_AGENT question From: James Holmes james.hol...@gmail.com Date: Thu, February 19, 2009 11:47 pm To: cf-talk cf-talk@houseoffusion.com It's trivial to fake this header and many bad bots (i.e. the ones that ignore robots.txt) will pretend to be IE or another browser. ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319589 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: HTTP_USER_AGENT question
I use this in my application.cfc right at top. It notes the more common bots (google) and assigns them a timeout of 2 seconds. You can use the same logic for whatever you want. IF (REFindNoCase('Slurp|Google|BecomeBot|msnbot|ZyBorg|RufusBot|EMonitor|java', cgi.http_user_agent)) This.sessionTimeout=createtimespan(0,0,0,2); On Thu, Feb 19, 2009 at 10:22 PM, Les Mizzell lesm...@bellsouth.net wrote: In working out my Googlebot problem, I came across an idea of using HTTP_USER_AGENT to identify a bot, and then exclude it from an area or whatever (redirect it to www.disney.com or something... that's a joke) ... assuming it gets in despite the robots.txt file. So, this looks like a good starting point to modify from: http://www.bennadel.com/blog/1083-ColdFusion-Session-Management-And-Spiders-Bots.htm I see how to deal with bots you can identify. Can anybody think of a way or have an example of how to figure out that it's a unidentifiable bot (rather than a real user with a browser) and redirect/whatever? Just asking ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:319590 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4