Have you looked at WWW::Salesforce? It requires a little more setup than a straight LWP call (and relies on SOAP::Lite), but is going to be a lot more reliable than screen scraping.
-Conor On Jun 8, 2012 1:08 PM, "Ken Cornetet" <ken.corne...@kimball.com> wrote: > If you are on windows, you can “drive” IE via OLE, and screen-scrape the > results.**** > > ** ** > > Here’s a bit of code I used some time ago (way back when we had physical > servers) to get the warrantee information from an HP web site.**** > > ** ** > > sub GetWarranty {**** > > ** ** > > my $serial = shift;**** > > my $prod = shift;**** > > ** ** > > my $parms = 'country=US&' .**** > > "serialNumber1=$serial&" .**** > > 'BODServiceID=NA&' .**** > > "productNumber=$prod&" .**** > > 'RegisteredPurchaseDate=';**** > > ** ** > > my $ie = Win32::OLE->new('InternetExplorer.Application');* > *** > > ** ** > > $ie->Navigate(" > http://us.itrc.hp.com/service/ewarranty/warrantyResults.do?$parms");**** > > ** ** > > # $ie->{Toolbar} = 0;**** > > # $ie->{StatusBar} = 0;**** > > # $ie->{Width} = 800;**** > > # $ie->{Height} = 400;**** > > # $ie->{Left} = 0;**** > > # $ie->{Top} = 0;**** > > ** ** > > while( $ie->{Busy} ) {**** > > Win32::Sleep(200);**** > > }**** > > ** ** > > # $ie->{Visible} = 1;**** > > Win32::Sleep(1000);**** > > ** ** > > # return "" unless defined $ie->Document;**** > > return "" unless $ie->Document;**** > > return "" unless $ie->Document->Body;**** > > # return "" unless defined $ie->Document->Body->{InnerHTML};* > *** > > ** ** > > my $res = $ie->Document->Body->{InnerHTML};**** > > ** ** > > $ie->Quit;**** > > ** ** > > return "" unless $res =~ /Start Date/;**** > > ** ** > > $res =~ s/\n//g;**** > > $res =~ s/\r//g;**** > > ** ** > > $res =~ s/^..*<TD colSpan=2><B>Start Date<\/B><\/TD><TD > colSpan=2>//;**** > > ** ** > > $res =~ s/<\/TD><\/TR>..*//g;**** > > ** ** > > $res =~ s/^ +//;**** > > $res =~ s/ +$//;**** > > ** ** > > return $res;**** > > }**** > > ** ** > > *From:* perl-win32-users-boun...@listserv.activestate.com [mailto: > perl-win32-users-boun...@listserv.activestate.com] *On Behalf Of *Greg > Aiken > *Sent:* Friday, June 08, 2012 3:23 PM > *To:* Perl-Win32-Users@listserv.activestate.com > *Subject:* naive LWP::Get question (perhaps JavaScript related?)**** > > ** ** > > hello perl users, today i am struggling to better understand what the > underlying issue is here.**** > > ** ** > > my employer uses a web based CMS system called 'salesforce.com'.**** > > ** ** > > using a web browser, i log into this site, i then use its web interface > and eventually display a 'customer record' web page in my browser. some > of the elements i see on the page include 'products they own', 'contacts', > 'recent activities', etc...**** > > ** ** > > if i;**** > > ** ** > > a. using the web browser, 'view page source' i can actually see/read the > ascii html code fragments that clearly list the customers contact names, > email addresses, etc... in other words, what i see on the screen of my > rendered web page (in the browser), im able to read the full underlying > html code fragments and data fragments when 'viewing page source'.**** > > ** ** > > b. save as the currently viewed web page to my local hard drive, and i > open the resulting *.htm file with a text editor, i am once again able to > see all elements, to include the customers contact names and their emails. > **** > > ** ** > > now i had the thought, using LWP::Get i should be able to simply get this > same URL thats presented in my web browser. **** > > ** ** > > my first test.**** > > ** ** > > 1. i copied this URL and opened a new tab in my web browser that already > had a login session going with salesforce.com, in the address field of > the new tab, i pasted the URL, then hit the 'enter' key. viola, the exact > same page (as was currently displayed on a different tab in my browser) > also displayed equally well with all data being displayed.**** > > ** ** > > 2. i then wrote an LWP::Get script where i pasted the exact same URL and > ran my script. my one liner...**** > > ** ** > > my $HTTP_response_code = LWP::Simple::mirror $url, 'test000.htm';**** > > print $HTTP_response_code;**** > > ** ** > > shows a status code of 200 (page retrieved), and resulted in a file > 'test000.htm' being written into the cwd. however, when i view the > contents of the file saved, its nothing close to 'browsers - view page > source' or to the contents of a web page saved locally from within the web > browser.**** > > ** ** > > my only guess here is that perhaps some elements of the page are > dynamically created via javascript, or other client browser technology - > which would be lacking from LWP::Get.**** > > ** ** > > if that is the reason, does anyone know if there is a notion of > 'simulating a browser' via a Perl script so i could do more than use HTTP > get, but instead simulate the full function of what a 'normal browser' > would do to essentially create the full contents of a page using > JavaScript, so that when i then save the contents of the page to a file to > evaluate, its got all dynamic content in place, and nothign is missing.*** > * > > ** ** > > perhaps there are other reasons why i am getting this behavior. didnt > know if any others have tried hacking at web pages in this manner before > and might have had a similar experience.**** > > ** ** > > greg**** > > _______________________________________________ > Perl-Win32-Users mailing list > Perl-Win32-Users@listserv.ActiveState.com > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs > >
_______________________________________________ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs