What about taking advantage of curl's built in cookie functions?

In particular, you should look at doing this with a two step process
utilizing the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE functions. First,
log in... Then grab the article itself, once the session has begun.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://www.yoursite.com";); // set url to
post to 
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_REFERER, "http://www.wsj.com";); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects 
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable 
curl_setopt($ch, CURLOPT_TIMEOUT, 50); // times out after 4s 
curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");  //initiates
cookie file
curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");  // Uses
previous session cookies
curl_setopt($ch,  CURLOPT_VERBOSE, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "mozilla/5.0 (x11; u; linux i686;
en-us; rv:1.5a) gecko/20030728 mozilla firebird/0.6.1");

        -- jon

-------------------
jon roig
web developer
email: [EMAIL PROTECTED]
phone: 888.230.7557

-----Original Message-----
From: Richard Miller [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 09, 2004 6:29 PM
To: [EMAIL PROTECTED]
Subject: [PHP] CURL and Cookies


I would appreciate any help you can give me about a problem I am having 
with PHP's CURL functions.

I want to use CURL to download news from Wall Street Journal Online.   
When you visit the WSJ home page, you're forwarded to an authentication 
page to enter your name and password, and then forwarded back to the 
home page.  I want my CURL command to send the authentication cookie so 
when it's forwarded to the authentication page it forwards right back 
to the home page without having to enter the name and password.

I can get the following CURL command to run fine at the command prompt, 
but not in PHP:

THIS WORKS
        curl --cookie "WSJIE_LOGIN=blahblahblah" -L -O 
"http://online.wsj.com/home/us";

THIS DOESN'T WORK
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, "http://online.wsj.com/home/us";);
        curl_setopt($ch, CURLOPT_COOKIE, "WSJIE_LOGIN=blahblahblah");
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $content = curl_exec($ch);
        curl_close($ch);



I used a packet sniffer to see how this works.  When I request the home 
page (above) and send the WSJIE_LOGIN cookie, the home page redirects 
to the authentication page.  The authentication page uses the 
WSJIE_LOGIN cookie to generate more cookies.  Then these 5-6 cookies 
are sent back to the home page and give the user access to the content. 
  The WSJIE_LOGIN cookie is my own personal authentication cookie; the 
other cookies change from time to time.  But I noticed that the PHP 
CURL isn't perpetuating these other cookies when it forwards back to 
the home page, like the command-line CURL does.  Here are blocks from 
the package capture:

CLI CURL
        ...
        192.168.001.100.63745-206.157.193.068.00080: GET /home/us
HTTP/1.1
        User-Agent: curl/7.10.2 (powerpc-apple-darwin7.0) libcurl/7.10.2

OpenSSL/0.9.7b zlib/1.1.4
        Cookie: WSJIE_LOGIN=abc
        Host: online.wsj.com
        Pragma: no-cache
        Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
        Cookie: fastlogin=xyz; wsjproducts=xyz; user_type=xyz; 
REMOTE_USER=xyz; UBID=xyz
        ...

PHP CURL
        ...
        192.168.001.100.63750-206.157.193.068.00080: GET /home/us
HTTP/1.1
        Cookie: WSJIE_LOGIN=abc
        Host: online.wsj.com
        Pragma: no-cache
        Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
        ...

PHP's curl doesn't forward the cookies that it is given at the previous 
page, so, of course, I don't get my content.  Any ideas why?

Richard Miller

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.572 / Virus Database: 362 - Release Date: 1/27/2004
 

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.572 / Virus Database: 362 - Release Date: 1/27/2004
 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to