Re: How to recover text from a web page

2010-09-22 Thread Sumner, Walt
Thanks for the lead on screen scrapes, but the problem is there is nothing
to scrape. The put URL(...) and revBrowserGet(tBrowserId,htmltext)
return the html, but not all of the text that is displayed on the page.

In fact, if I use Word's merge documents tool to compare the html from pages
2, 9, and 256 of the petition, there is NO DIFFERENCE in the files. The
petition signatures and comments are embedded in a petition widget, I think,
which I suppose is some javascript applet. Whatever it is, the html
definitely does not contain the petition text that I want to evaluate.

Nevertheless it is trivial to manually select and copy all of the text on
the page. Once it is copied it is easy to automatically paste it, scrape it
(that code works fine), and store data using LiveCode, but I do not see a
way to select and copy text from this widget using LiveCode.

 On Tue, 21 Sep 2010 22:23:17, stephen barncard wrote:
 Why bother with revBrowser at all?  Just  do this in the message box:
 
 put URL(http://website.com/page.html)
 
  and this will put the website html into the message box output. Obviously
 you could do this with fields.
 
 Check out Jerry's videos on Screen Scraping:
 
 http://revmentor.com/business-logic-screen-scraping-1
 http://revmentor.com/business-logic-screen-scraping-0
 
 
 On 21 September 2010 22:16, Sumner, Walt WSUMNER at dom.wustl.edu wrote:
 
 I am trying to recover text from this web page and all of its siblings:
 
 
 http://www.thepetitionsite.com/1/keep-life-saving-electronic-cigarettes-avail
 able/#sigs/691732733/user/1
 
 The interesting part of the page is the comments, which do not appear in
 the HTML, but which can be copied manually. I can open this page in a
 browser in LiveCode. With manual mouse motions, I can double click a block
 of text, choose Select All from the Edit menu, choose Copy from the
 Edit menu, and then paste into a field where the comments all appear and
 are easy to disassemble.
 
 Unfortunately, the revbrowser set command and get function do not do
 anything comparable AFAICT. The Select All choice is not implemented in
 the DoMenu command. I think that printing a pdf is also out. So, any
 thoughts on how to automate this part of a petition review? For instance,
 maybe there is a simple way to save the text to a file with the
 revBrowserExecuteScript function (using JavaScript for Safari)?
 
 BTW, the browser is fully capable of crashing LiveCode on at least some OSX
 machines. Please don't lose any work for me.
 
 Thanks,
 
 Walt___
 use-revolution mailing list
 use-revolution at lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your
 subscription preferences:
 http://lists.runrev.com/mailman/listinfo/use-revolution
 

Walton Sumner
 


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to recover text from a web page

2010-09-22 Thread stephen barncard
perhaps there's an iFrame  or include in the html that references another
page --

On 22 September 2010 12:14, Sumner, Walt wsum...@dom.wustl.edu wrote:

 Thanks for the lead on screen scrapes, but the problem is there is nothing
 to scrape. The put URL(...) and revBrowserGet(tBrowserId,htmltext)
 return the html, but not all of the text that is displayed on the page.



Stephen Barncard
San Francisco Ca. USA

more about sqb  http://www.google.com/profiles/sbarncar
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to recover text from a web page

2010-09-22 Thread David C.
 perhaps there's an iFrame  or include in the html that references another
 page --


There is definitely an iFrame involved and they have it pointing to
some sort of PHP servlet hosted from an entirely different domain.
Doesn't look to be much you can do with that combination.

Best regards,
David C.
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to recover text from a web page

2010-09-22 Thread Jim Ault
Try using the url of the iframe. This will get that HTML only, but  
that could be all you need.


 Most sites will work easily, but some have a security variable that  
is sent along to the iframe url

that signals 'intended use'.


The include should simply add the text of another file to the current  
file, effectively inserting that text at the location of the include.   
The result should be that you will get the HTML without any other steps.



On Sep 22, 2010, at 6:45 PM, David C. wrote:

perhaps there's an iFrame  or include in the html that references  
another

page --



There is definitely an iFrame involved and they have it pointing to
some sort of PHP servlet hosted from an entirely different domain.
Doesn't look to be much you can do with that combination.



Jim Ault
Las Vegas



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to recover text from a web page

2010-09-21 Thread stephen barncard
Why bother with revBrowser at all?  Just  do this in the message box:

put URL(http://website.com/page.html)

 and this will put the website html into the message box output. Obviously
you could do this with fields.

Check out Jerry's videos on Screen Scraping:

http://revmentor.com/business-logic-screen-scraping-1
http://revmentor.com/business-logic-screen-scraping-0


On 21 September 2010 22:16, Sumner, Walt wsum...@dom.wustl.edu wrote:

 I am trying to recover text from this web page and all of its siblings:


 http://www.thepetitionsite.com/1/keep-life-saving-electronic-cigarettes-available/#sigs/691732733/user/1

 The interesting part of the page is the comments, which do not appear in
 the HTML, but which can be copied manually. I can open this page in a
 browser in LiveCode. With manual mouse motions, I can double click a block
 of text, choose Select All from the Edit menu, choose Copy from the
 Edit menu, and then paste into a field where the comments all appear and
 are easy to disassemble.

 Unfortunately, the revbrowser set command and get function do not do
 anything comparable AFAICT. The Select All choice is not implemented in
 the DoMenu command. I think that printing a pdf is also out. So, any
 thoughts on how to automate this part of a petition review? For instance,
 maybe there is a simple way to save the text to a file with the
 revBrowserExecuteScript function (using JavaScript for Safari)?

 BTW, the browser is fully capable of crashing LiveCode on at least some OSX
 machines. Please don't lose any work for me.

 Thanks,

 Walt___
 use-revolution mailing list
 use-revolution@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your
 subscription preferences:
 http://lists.runrev.com/mailman/listinfo/use-revolution




-- 



Stephen Barncard
San Francisco Ca. USA

more about sqb  http://www.google.com/profiles/sbarncar
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to recover text from a web page

2010-09-21 Thread Shadow Slash
Hi Walt,

Umm, if I got it right, you simply want to get the source code of that certain 
page? If that's what you want to do, you can just use the revBrowserGet() 
function and retrieve its htmltext property.

Best regards,
Shedo Surashu
www.ShadowSlash.tk

Connect with me on LinkedIn. (http://ph.linkedin.com/in/shadowslash)


--- On Wed, 22/9/10, Sumner, Walt wsum...@dom.wustl.edu wrote:

 From: Sumner, Walt wsum...@dom.wustl.edu
 Subject: How to recover text from a web page
 To: use-revolution@lists.runrev.com use-revolution@lists.runrev.com
 Date: Wednesday, 22 September, 2010, 5:16 AM
 I am trying to recover text from this
 web page and all of its siblings:
 
 http://www.thepetitionsite.com/1/keep-life-saving-electronic-cigarettes-available/#sigs/691732733/user/1
 
 The interesting part of the page is the comments, which do
 not appear in the HTML, but which can be copied manually. I
 can open this page in a browser in LiveCode. With manual
 mouse motions, I can double click a block of text, choose
 Select All from the Edit menu, choose Copy from the
 Edit menu, and then paste into a field where the comments
 all appear and are easy to disassemble. 
 
 Unfortunately, the revbrowser set command and get function
 do not do anything comparable AFAICT. The Select All
 choice is not implemented in the DoMenu command. I think
 that printing a pdf is also out. So, any thoughts on how to
 automate this part of a petition review? For instance, maybe
 there is a simple way to save the text to a file with the
 revBrowserExecuteScript function (using JavaScript for
 Safari)?
 
 BTW, the browser is fully capable of crashing LiveCode on
 at least some OSX machines. Please don't lose any work for
 me.
 
 Thanks,
 
 Walt___
 use-revolution mailing list
 use-revolution@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage
 your subscription preferences:
 http://lists.runrev.com/mailman/listinfo/use-revolution
 



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution