On 25 November 2010 11:32, Deva <[email protected]> wrote:
> Use curl
> http://php.net/manual/en/book.curl.php
>
>
> On Thu, Nov 25, 2010 at 4:41 PM, Shreyas Agasthya <[email protected]>wrote:
>
>> I feel you should use more of the 4th method here as you are not trying to
>> read the file but the header level (7th layer) information of the HTTP
>> protocol.
>>
>> http://php.net/manual/en/function.file-get-contents.php
>>
>>
>> --Shreyas
>>
>> On Thu, Nov 25, 2010 at 4:11 PM, Ron Piggott <
>> [email protected]
>> > wrote:
>>
>> > Will the header pass with using file_get_contents , or should I be
>> using
>> > another command, and if so, which one? Ron
>> >
>> > <?php
>> >
>> > header('User Agent: RonBot (http://www.example.com)');
>> > $url = "http://www.example.com"; <http://www.example.com%22;>
>> >
>> > $input = file_get_contents($url);
>> >
>> >
>> >
>> > The Verse of the Day
>> > “Encouragement from God’s Word”
>> > http://www.TheVerseOfTheDay.info
>> >
>> > *From:* Shreyas Agasthya <[email protected]>
>> > *Sent:* Thursday, November 25, 2010 4:21 AM
>> > *To:* Ron Piggott <[email protected]>
>> > *Cc:* [email protected] ; [email protected]
>> > *Subject:* Re: [PHP] Fw: Spoofing user_agent
>> >
>> > A standard HTTP Request headers is : User Agent (without the underscore).
>> >
>> > --Shreyas
>> >
>> > On Thu, Nov 25, 2010 at 2:36 PM, Ron Piggott <
>> > [email protected]> wrote:
>> >
>> >>
>> >> Is this what you are telling me to do:
>> >>
>> >> header('user_agent: RonBot (http://www.theverseoftheday.info)');
>> >>
>> >> Ron
>> >>
>> >> The Verse of the Day
>> >> “Encouragement from God’s Word”
>> >> http://www.TheVerseOfTheDay.info
>> >>
>> >> From: [email protected]
>> >> Sent: Thursday, November 25, 2010 3:34 AM
>> >> To: Ron Piggott ; [email protected]
>> >> Subject: Re: [PHP] Fw: Spoofing user_agent
>> >>
>> >> You need to set it in the header request you make. Putting it in the
>> >> script you're using as a spider with ini_set won't do anything because
>> the
>> >> Target site doesn't know anything about it.
>> >>
>> >> Thanks,
>> >> Ash
>> >> http://www.ashleysheridan.co.uk
>> >>
>> >> ----- Reply message -----
>> >> From: "Ron Piggott" <[email protected]>
>> >> Date: Thu, Nov 25, 2010 08:25
>> >> Subject: [PHP] Fw: Spoofing user_agent
>> >> To: <[email protected]>
>> >>
>> >> I have wrote a script to generate a sitemap of my web site. It crawls
>> all
>> >> of the site web pages. (About 30,000)
>> >>
>> >> I need help to spoof the user_agent variable so the stats program
>> running
>> >> in the background ( “AWSTATS” ) will treat the crawl as a bot, not
>> browsing
>> >> usage.
>> >>
>> >> The sitemap generator is a cron job. I tried the syntax:
>> >> ini_set('user_agent', 'RonBot (http://www.theverseoftheday.info)/'/);
>> >>
>> >> This didn’t work. The browsing was attributed to the dedicated IP
>> >> address.
>> >>
>> >> How do I get AWSTATS to access this, such as other entries under the
>> >> “Robots/Spiders visitors” heading:
>> >> Unknown robot (identified by 'bot*')
>> >>
>> >> I don’t mean any ill will by changing this setting. Thanks for the
>> help.
>> >>
>> >> Ron
>> >>
>> >> The Verse of the Day
>> >> “Encouragement from God’s Word”
>> >> http://www.TheVerseOfTheDay.info
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards,
>> > Shreyas Agasthya
>> >
>>
>>
>>
>> --
>> Regards,
>> Shreyas Agasthya
>>
>
>
>
> --
> :DJ
>
It is no use using header(). This sets a header for the client, not
the server of any file_get_contents() requests.
I use stream_contexts.
$s_Contents = file_get_contents(
$s_URL,
False,
stream_context_create(
array(
'http' => array(
'method' => 'GET',
'header' => "User-Agent: RonBot (http://www.example.com)\r\n"
),
)
)
);
You can supply cookies, or anything else, with the request. Make sure
you add a \r\n to each of the headers and just concatenate them.
If you are doing this in a loop, then I'd recommend creating a default
stream context and then the request would just be ...
$s_Contents = file_get_contents($s_URL);
As the default stream context would be applied.
I had to use a default stream context to route all http requests
through an NTLM authentication proxy server because PHP doesn't deal
with NTLM authentication.
See my user notes on
http://docs.php.net/manual/en/function.stream-context-get-default.php.
Don't bother with the link at the bottom of the user note- it's not
live.
Richard.
--
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php