Re: How to recover text from a web page
Thanks for the lead on screen scrapes, but the problem is there is nothing to scrape. The put URL(...) and revBrowserGet(tBrowserId,htmltext) return the html, but not all of the text that is displayed on the page. In fact, if I use Word's merge documents tool to compare the html from pages 2, 9, and 256 of the petition, there is NO DIFFERENCE in the files. The petition signatures and comments are embedded in a petition widget, I think, which I suppose is some javascript applet. Whatever it is, the html definitely does not contain the petition text that I want to evaluate. Nevertheless it is trivial to manually select and copy all of the text on the page. Once it is copied it is easy to automatically paste it, scrape it (that code works fine), and store data using LiveCode, but I do not see a way to select and copy text from this widget using LiveCode. On Tue, 21 Sep 2010 22:23:17, stephen barncard wrote: Why bother with revBrowser at all? Just do this in the message box: put URL(http://website.com/page.html) and this will put the website html into the message box output. Obviously you could do this with fields. Check out Jerry's videos on Screen Scraping: http://revmentor.com/business-logic-screen-scraping-1 http://revmentor.com/business-logic-screen-scraping-0 On 21 September 2010 22:16, Sumner, Walt WSUMNER at dom.wustl.edu wrote: I am trying to recover text from this web page and all of its siblings: http://www.thepetitionsite.com/1/keep-life-saving-electronic-cigarettes-avail able/#sigs/691732733/user/1 The interesting part of the page is the comments, which do not appear in the HTML, but which can be copied manually. I can open this page in a browser in LiveCode. With manual mouse motions, I can double click a block of text, choose Select All from the Edit menu, choose Copy from the Edit menu, and then paste into a field where the comments all appear and are easy to disassemble. Unfortunately, the revbrowser set command and get function do not do anything comparable AFAICT. The Select All choice is not implemented in the DoMenu command. I think that printing a pdf is also out. So, any thoughts on how to automate this part of a petition review? For instance, maybe there is a simple way to save the text to a file with the revBrowserExecuteScript function (using JavaScript for Safari)? BTW, the browser is fully capable of crashing LiveCode on at least some OSX machines. Please don't lose any work for me. Thanks, Walt___ use-revolution mailing list use-revolution at lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution Walton Sumner ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: How to recover text from a web page
perhaps there's an iFrame or include in the html that references another page -- On 22 September 2010 12:14, Sumner, Walt wsum...@dom.wustl.edu wrote: Thanks for the lead on screen scrapes, but the problem is there is nothing to scrape. The put URL(...) and revBrowserGet(tBrowserId,htmltext) return the html, but not all of the text that is displayed on the page. Stephen Barncard San Francisco Ca. USA more about sqb http://www.google.com/profiles/sbarncar ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: How to recover text from a web page
perhaps there's an iFrame or include in the html that references another page -- There is definitely an iFrame involved and they have it pointing to some sort of PHP servlet hosted from an entirely different domain. Doesn't look to be much you can do with that combination. Best regards, David C. ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: How to recover text from a web page
Try using the url of the iframe. This will get that HTML only, but that could be all you need. Most sites will work easily, but some have a security variable that is sent along to the iframe url that signals 'intended use'. The include should simply add the text of another file to the current file, effectively inserting that text at the location of the include. The result should be that you will get the HTML without any other steps. On Sep 22, 2010, at 6:45 PM, David C. wrote: perhaps there's an iFrame or include in the html that references another page -- There is definitely an iFrame involved and they have it pointing to some sort of PHP servlet hosted from an entirely different domain. Doesn't look to be much you can do with that combination. Jim Ault Las Vegas ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: How to recover text from a web page
Why bother with revBrowser at all? Just do this in the message box: put URL(http://website.com/page.html) and this will put the website html into the message box output. Obviously you could do this with fields. Check out Jerry's videos on Screen Scraping: http://revmentor.com/business-logic-screen-scraping-1 http://revmentor.com/business-logic-screen-scraping-0 On 21 September 2010 22:16, Sumner, Walt wsum...@dom.wustl.edu wrote: I am trying to recover text from this web page and all of its siblings: http://www.thepetitionsite.com/1/keep-life-saving-electronic-cigarettes-available/#sigs/691732733/user/1 The interesting part of the page is the comments, which do not appear in the HTML, but which can be copied manually. I can open this page in a browser in LiveCode. With manual mouse motions, I can double click a block of text, choose Select All from the Edit menu, choose Copy from the Edit menu, and then paste into a field where the comments all appear and are easy to disassemble. Unfortunately, the revbrowser set command and get function do not do anything comparable AFAICT. The Select All choice is not implemented in the DoMenu command. I think that printing a pdf is also out. So, any thoughts on how to automate this part of a petition review? For instance, maybe there is a simple way to save the text to a file with the revBrowserExecuteScript function (using JavaScript for Safari)? BTW, the browser is fully capable of crashing LiveCode on at least some OSX machines. Please don't lose any work for me. Thanks, Walt___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution -- Stephen Barncard San Francisco Ca. USA more about sqb http://www.google.com/profiles/sbarncar ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: How to recover text from a web page
Hi Walt, Umm, if I got it right, you simply want to get the source code of that certain page? If that's what you want to do, you can just use the revBrowserGet() function and retrieve its htmltext property. Best regards, Shedo Surashu www.ShadowSlash.tk Connect with me on LinkedIn. (http://ph.linkedin.com/in/shadowslash) --- On Wed, 22/9/10, Sumner, Walt wsum...@dom.wustl.edu wrote: From: Sumner, Walt wsum...@dom.wustl.edu Subject: How to recover text from a web page To: use-revolution@lists.runrev.com use-revolution@lists.runrev.com Date: Wednesday, 22 September, 2010, 5:16 AM I am trying to recover text from this web page and all of its siblings: http://www.thepetitionsite.com/1/keep-life-saving-electronic-cigarettes-available/#sigs/691732733/user/1 The interesting part of the page is the comments, which do not appear in the HTML, but which can be copied manually. I can open this page in a browser in LiveCode. With manual mouse motions, I can double click a block of text, choose Select All from the Edit menu, choose Copy from the Edit menu, and then paste into a field where the comments all appear and are easy to disassemble. Unfortunately, the revbrowser set command and get function do not do anything comparable AFAICT. The Select All choice is not implemented in the DoMenu command. I think that printing a pdf is also out. So, any thoughts on how to automate this part of a petition review? For instance, maybe there is a simple way to save the text to a file with the revBrowserExecuteScript function (using JavaScript for Safari)? BTW, the browser is fully capable of crashing LiveCode on at least some OSX machines. Please don't lose any work for me. Thanks, Walt___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution