On Tue, 2010-11-30 at 00:19 -0200, Thiago H. Pojda wrote:

> Quit top posting.
> On Mon, Nov 29, 2010 at 9:55 PM, Ron Piggott <ron.pigg...@actsministries.org
> > wrote:
> >
> > My issue with the user agent is unresolved.  I need to do more research to
> > see how AWSTATS distinguishes between a robot crawling the site and a web
> > page user and set the user-agent accordingly.
> >
> Ron,
> AWSTATS probably users a knowledge base for known bots, I'm not sure. If
> that's the case, you can just set your User-Agent to a known and see how
> that goes.
> Look for Googlebot, Majestic, Ask.com (now dead - probably a good pick),
> MSNBot here: http://www.user-agents.org/
> As for setting the User-Agent in your request, I like to use this cUrl
> snippet (based on a note at curl's manual page):
> <?php
>   $sUrl = 'www.example.com/';
>   $sUserAgent = 'Googlebot/2.1 (+http://www.googlebot.com/bot.html)';
>   $hCurl = curl_init();
>   curl_setopt ($hCurl, CURLOPT_RETURNTRANSFER, TRUE);
>   curl_setopt ($hCurl, CURLOPT_URL, $sUrl);
>   curl_setopt ($hCurl, CURLOPT_CONNECTTIMEOUT, 120);
>   curl_setopt ($hCurl, CURLOPT_TIMEOUT, 120);
>   curl_setopt ($hCurl, CURLOPT_USERAGENT, $sUserAgent);
>   $sContent = curl_exec($hCurl);
> ?>
> Cheers,
> Thiago Henrique Pojda
> +55 41 8856-7925

There's a very easy way to read in a user agent and determine if it is a
bot or not. Google for browscap.ini. This is basically a massive ini
file containing details of known user agent header strings and some
basic information about them, including whether it is a bot or not.
There are various functions in PHP for parsing this, but I'm still not
exactly sure what you want to do, as so far you've been both asking how
other scripts check for bots and how to change your own user agent
string, both of which are quite different.


Reply via email to