Have you looked at WWW::Salesforce?

It requires a little more setup than a straight LWP call (and relies on
SOAP::Lite), but is going to be a lot more reliable than screen scraping.

-Conor
On Jun 8, 2012 1:08 PM, "Ken Cornetet" <ken.corne...@kimball.com> wrote:

> If you are on windows, you can “drive” IE via OLE, and screen-scrape the
> results.****
>
> ** **
>
> Here’s a bit of code I used some time ago (way back when we had physical
> servers) to get the warrantee information from an HP web site.****
>
> ** **
>
> sub GetWarranty {****
>
> ** **
>
>                 my $serial = shift;****
>
>                 my $prod = shift;****
>
> ** **
>
>                 my $parms = 'country=US&' .****
>
>                     "serialNumber1=$serial&" .****
>
>                                 'BODServiceID=NA&' .****
>
>                     "productNumber=$prod&" .****
>
>                     'RegisteredPurchaseDate=';****
>
> ** **
>
>                 my $ie = Win32::OLE->new('InternetExplorer.Application');*
> ***
>
> ** **
>
>                 $ie->Navigate("
> http://us.itrc.hp.com/service/ewarranty/warrantyResults.do?$parms";);****
>
> ** **
>
> #              $ie->{Toolbar} = 0;****
>
> #              $ie->{StatusBar} = 0;****
>
> #              $ie->{Width} = 800;****
>
> #              $ie->{Height} = 400;****
>
> #              $ie->{Left} = 0;****
>
> #              $ie->{Top} = 0;****
>
> ** **
>
>                 while( $ie->{Busy} ) {****
>
>                                 Win32::Sleep(200);****
>
>                 }****
>
> ** **
>
> #              $ie->{Visible} = 1;****
>
>                 Win32::Sleep(1000);****
>
> ** **
>
> #              return "" unless defined $ie->Document;****
>
>                 return "" unless $ie->Document;****
>
>                 return "" unless $ie->Document->Body;****
>
> #              return "" unless defined $ie->Document->Body->{InnerHTML};*
> ***
>
> ** **
>
>                 my $res = $ie->Document->Body->{InnerHTML};****
>
> ** **
>
>                 $ie->Quit;****
>
> ** **
>
>                 return "" unless $res =~ /Start Date/;****
>
> ** **
>
>                 $res =~ s/\n//g;****
>
>                 $res =~ s/\r//g;****
>
> ** **
>
>                 $res =~ s/^..*<TD colSpan=2><B>Start Date<\/B><\/TD><TD
> colSpan=2>//;****
>
> ** **
>
>                 $res =~ s/<\/TD><\/TR>..*//g;****
>
> ** **
>
>                 $res =~ s/^ +//;****
>
>                 $res =~ s/ +$//;****
>
> ** **
>
>                 return $res;****
>
> }****
>
> ** **
>
> *From:* perl-win32-users-boun...@listserv.activestate.com [mailto:
> perl-win32-users-boun...@listserv.activestate.com] *On Behalf Of *Greg
> Aiken
> *Sent:* Friday, June 08, 2012 3:23 PM
> *To:* Perl-Win32-Users@listserv.activestate.com
> *Subject:* naive LWP::Get question (perhaps JavaScript related?)****
>
> ** **
>
> hello perl users,  today i am struggling to better understand what the
> underlying issue is here.****
>
> ** **
>
> my employer uses a web based CMS system called 'salesforce.com'.****
>
> ** **
>
> using a web browser, i log into this site, i then use its web interface
>  and eventually display a 'customer record' web page in my browser.  some
> of the elements i see on the page include 'products they own', 'contacts',
> 'recent activities', etc...****
>
> ** **
>
> if i;****
>
> ** **
>
> a. using the web browser, 'view page source' i can actually see/read the
> ascii html code fragments that clearly list the customers contact names,
> email addresses, etc...  in other words, what i see on the screen of my
> rendered web page (in the browser), im able to read the full underlying
> html code fragments and data fragments when 'viewing page source'.****
>
> ** **
>
> b. save as the currently viewed web page to my local hard drive, and i
> open the resulting *.htm file with a text editor, i am once again able to
> see all elements, to include the customers contact names and their emails.
> ****
>
> ** **
>
> now i had the thought, using LWP::Get i should be able to simply get this
> same URL thats presented in my web browser.  ****
>
> ** **
>
> my first test.****
>
> ** **
>
> 1. i copied this URL and opened a new tab in my web browser that already
> had a login session going with salesforce.com, in the address field of
> the new tab, i pasted the URL, then hit the 'enter' key.  viola, the exact
> same page (as was currently displayed on a different tab in my browser)
> also displayed equally well with all data being displayed.****
>
> ** **
>
> 2. i then wrote an LWP::Get script where i pasted the exact same URL and
> ran my script.  my one liner...****
>
> ** **
>
> my $HTTP_response_code = LWP::Simple::mirror $url, 'test000.htm';****
>
> print $HTTP_response_code;****
>
> ** **
>
> shows a status code of 200 (page retrieved), and resulted in a file
> 'test000.htm' being written into the cwd.  however, when i view the
> contents of the file saved, its nothing close to 'browsers - view page
> source' or to the contents of a web page saved locally from within the web
> browser.****
>
> ** **
>
> my only guess here is that perhaps some elements of the page are
> dynamically created via javascript, or other client browser technology -
> which would be lacking from LWP::Get.****
>
> ** **
>
> if that is the reason, does anyone know if there is a notion of
> 'simulating a browser' via a Perl script so i could do more than use HTTP
> get, but instead simulate the full function of what a 'normal browser'
> would do to essentially create the full contents of a page using
> JavaScript, so that when i then save the contents of the page to a file to
> evaluate, its got all dynamic content in place, and nothign is missing.***
> *
>
> ** **
>
> perhaps there are other reasons why i am getting this behavior.  didnt
> know if any others have tried hacking at web pages in this manner before
> and might have had a similar experience.****
>
> ** **
>
> greg****
>
> _______________________________________________
> Perl-Win32-Users mailing list
> Perl-Win32-Users@listserv.ActiveState.com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>
>
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to