OUTFILE;
(Andrew Johnson )
) Marketing Writer (
( Elias/Savion Advertising )
( Phone: 412.642.7700 Fax 412.642.2277 )
) www.elias-savion.com
I've been wrestling with a script to scrape some information off of
BusinessWeek.com for a while now. But I've run into problems trying to
authenticate my agent to the businessweek server.
My script pulls a list of URLs by running some searches. Some of the URLs
resulting from the searches kick
=Ha786ec07e744c6efd8cdd1df37580a9c:session_id=8f51eaf294c711d98900e4
d5b08dcaefkid=310001.100170ss=env; Path=/
Title: BusinessWeek Online
XXX-Authenticate: CGIPassword
(Andrew Johnson )
) Marketing Writer (
( Elias/Savion
.
(Andrew Johnson )
) Marketing Writer (
( Elias/Savion Advertising )
( Phone: 412.642.7700 Fax 412.642.2277 )
) www.elias-savion.com(
([EMAIL PROTECTED
Here's some code for you. It doesn't do any form input, and you might
consider making it more friendly to the webserver with some sleep lines,
depending on who you're scraping.
use strict;
#use warnings;
use LWP::UserAgent;
use HTML::TokeParser;
use HTTP::COOKIES::Netscape;
When I went to port my code that scrapes the BusinessWeek site for search
results to BaselineMag.com, I found that the HTML response of the search
results generated by the site do not contain the actual text that appears in
my browser.
For example, the source code of:
.
Andrew Johnson
Marketing Writer
Elias/Savion Advertising
Phone: 412.642.7700 Fax 412.642.2277
www.elias-savion.com
[EMAIL PROTECTED]
sub Report
{ open (ARTICLES, $_[0]);
open (DATA, data.csv);
while (ARTICLES)
{
my $count=0
There is are Google APIs:
http://code.google.com/
-Original Message-
From: Ryan Perry [mailto:[EMAIL PROTECTED]
Sent: Fri 6/16/2006 3:13 PM
To: libwww@perl.org
Subject: AJAX/Google Pages
I want to access Google Pages. Since there is no native API I thought I
could use LWP. Can I