If you are on windows, you can "drive" IE via OLE, and screen-scrape the 
results.

Here's a bit of code I used some time ago (way back when we had physical 
servers) to get the warrantee information from an HP web site.

sub GetWarranty {

                my $serial = shift;
                my $prod = shift;

                my $parms = 'country=US&' .
                    "serialNumber1=$serial&" .
                                'BODServiceID=NA&' .
                    "productNumber=$prod&" .
                    'RegisteredPurchaseDate=';

                my $ie = Win32::OLE->new('InternetExplorer.Application');

                
$ie->Navigate("http://us.itrc.hp.com/service/ewarranty/warrantyResults.do?$parms";);

#              $ie->{Toolbar} = 0;
#              $ie->{StatusBar} = 0;
#              $ie->{Width} = 800;
#              $ie->{Height} = 400;
#              $ie->{Left} = 0;
#              $ie->{Top} = 0;

                while( $ie->{Busy} ) {
                                Win32::Sleep(200);
                }

#              $ie->{Visible} = 1;
                Win32::Sleep(1000);

#              return "" unless defined $ie->Document;
                return "" unless $ie->Document;
                return "" unless $ie->Document->Body;
#              return "" unless defined $ie->Document->Body->{InnerHTML};

                my $res = $ie->Document->Body->{InnerHTML};

                $ie->Quit;

                return "" unless $res =~ /Start Date/;

                $res =~ s/\n//g;
                $res =~ s/\r//g;

                $res =~ s/^..*<TD colSpan=2><B>Start Date<\/B><\/TD><TD 
colSpan=2>//;

                $res =~ s/<\/TD><\/TR>..*//g;

                $res =~ s/^ +//;
                $res =~ s/ +$//;

                return $res;
}

From: perl-win32-users-boun...@listserv.activestate.com 
[mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Greg 
Aiken
Sent: Friday, June 08, 2012 3:23 PM
To: Perl-Win32-Users@listserv.activestate.com
Subject: naive LWP::Get question (perhaps JavaScript related?)

hello perl users,  today i am struggling to better understand what the 
underlying issue is here.

my employer uses a web based CMS system called 
'salesforce.com<http://salesforce.com>'.

using a web browser, i log into this site, i then use its web interface  and 
eventually display a 'customer record' web page in my browser.  some of the 
elements i see on the page include 'products they own', 'contacts', 'recent 
activities', etc...

if i;

a. using the web browser, 'view page source' i can actually see/read the ascii 
html code fragments that clearly list the customers contact names, email 
addresses, etc...  in other words, what i see on the screen of my rendered web 
page (in the browser), im able to read the full underlying html code fragments 
and data fragments when 'viewing page source'.

b. save as the currently viewed web page to my local hard drive, and i open the 
resulting *.htm file with a text editor, i am once again able to see all 
elements, to include the customers contact names and their emails.

now i had the thought, using LWP::Get i should be able to simply get this same 
URL thats presented in my web browser.

my first test.

1. i copied this URL and opened a new tab in my web browser that already had a 
login session going with salesforce.com<http://salesforce.com>, in the address 
field of the new tab, i pasted the URL, then hit the 'enter' key.  viola, the 
exact same page (as was currently displayed on a different tab in my browser) 
also displayed equally well with all data being displayed.

2. i then wrote an LWP::Get script where i pasted the exact same URL and ran my 
script.  my one liner...

my $HTTP_response_code = LWP::Simple::mirror $url, 'test000.htm';
print $HTTP_response_code;

shows a status code of 200 (page retrieved), and resulted in a file 
'test000.htm' being written into the cwd.  however, when i view the contents of 
the file saved, its nothing close to 'browsers - view page source' or to the 
contents of a web page saved locally from within the web browser.

my only guess here is that perhaps some elements of the page are dynamically 
created via javascript, or other client browser technology - which would be 
lacking from LWP::Get.

if that is the reason, does anyone know if there is a notion of 'simulating a 
browser' via a Perl script so i could do more than use HTTP get, but instead 
simulate the full function of what a 'normal browser' would do to essentially 
create the full contents of a page using JavaScript, so that when i then save 
the contents of the page to a file to evaluate, its got all dynamic content in 
place, and nothign is missing.

perhaps there are other reasons why i am getting this behavior.  didnt know if 
any others have tried hacking at web pages in this manner before and might have 
had a similar experience.

greg
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to