Re: Website scraping - How can I load a 'partial' page?

2017-12-13 Thread Mike Bonner via use-livecode
Hmm, or use range as mentioned in my other mail.

If the server supports range requests you can set your headers to include--
Range: bytes=0-2000to get the first 2000 bytes.

or use curl with -r 0-2000 but i have yet to find a page that will return
only a range.

Apparently you can find out if a page will accept ranges using curl with
something like this..

curl -I http://i.imgur.com/z4d4kWk.jpg

HTTP/1.1 200 OK
...
Accept-Ranges: bytes
Content-Length: 146515

if it has "Accept=Ranges: bytes" as part of the response, it should work.
I'm still thinking the intermediary method is best.



On Wed, Dec 13, 2017 at 8:39 AM, Mike Bonner  wrote:

> I suppose one could use sockets and partial GET requests (using a range:
> header), but i suspect it would be easier to just use an intermediary
> server to handle things.  To test, I set up an extremely simple page with
> the following:
>
>  put $_GET["page"] into tPage -- a get request TO my pageof the form ?page=
> http://url.goes.here
>  put char 1 to 6000 of url tpage  -- request the page to be scraped and
> return the first 6000 chars
>
> ?>
> To use this is a simple--  get URL "http://path.to.my.page.com/
> scrape.lc?page=http://server.to.scrape.com/pagetoscrape.html;
>
> if the page to be scraped uses a get style request, it will might be
> better to use post instead.
>
> In this way you can use a server on a hot connect to do the heavy lifting
> and then just send the results back down.  In fact, you could probably have
> the server itself do the scraping and just return any final results (or pop
> the results into a database or whatever)  Also in fact, if you have enough
> control of the server, and need to scrape the same page over and over for
> changes you could most likely set up a cronjob to do the work and a front
> end to pull the results.  (don't know what your final objective is, so hard
> to say whats best)
>
>
>
> On Wed, Dec 13, 2017 at 6:39 AM, Roger Eller via use-livecode <
> use-livecode@lists.runrev.com> wrote:
>
>> I have a webpage that I grab with LiveCode, then parse out what I need.
>> The data I keep is within the first 1/4th of the page.
>>
>> Rather than loading the entire page into a variable or a browser object,
>> how can I load just the portion that I need and then stop the transmission
>> instead of wasting the time and bandwidth to load the entire page?
>>
>> ~Roger
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Website scraping - How can I load a 'partial' page?

2017-12-13 Thread Rick Harrison via use-livecode
Hi Roger,

I don’t know who’s webpage is that you are
scraping, but if it is a third party’s webpage
make sure that you are not violating their
terms of agreement or infringing on their
copyright.  You might want to ask for their
permission to do so, to make sure you are
safe and legal.

If it is your own webpage, then feel perfectly
at ease scraping away.

Cheers,

Rick

> On Dec 13, 2017, at 8:39 AM, Roger Eller via use-livecode 
>  wrote:
> 
> I have a webpage that I grab with LiveCode, then parse out what I need.
> The data I keep is within the first 1/4th of the page.
> 
> Rather than loading the entire page into a variable or a browser object,
> how can I load just the portion that I need and then stop the transmission
> instead of wasting the time and bandwidth to load the entire page?
> 
> ~Roger
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Website scraping - How can I load a 'partial' page?

2017-12-13 Thread Mike Bonner via use-livecode
I suppose one could use sockets and partial GET requests (using a range:
header), but i suspect it would be easier to just use an intermediary
server to handle things.  To test, I set up an extremely simple page with
the following:

http://url.goes.here
 put char 1 to 6000 of url tpage  -- request the page to be scraped and
return the first 6000 chars

?>
To use this is a simple--  get URL "
http://path.to.my.page.com/scrape.lc?page=http://server.to.scrape.com/pagetoscrape.html
"

if the page to be scraped uses a get style request, it will might be better
to use post instead.

In this way you can use a server on a hot connect to do the heavy lifting
and then just send the results back down.  In fact, you could probably have
the server itself do the scraping and just return any final results (or pop
the results into a database or whatever)  Also in fact, if you have enough
control of the server, and need to scrape the same page over and over for
changes you could most likely set up a cronjob to do the work and a front
end to pull the results.  (don't know what your final objective is, so hard
to say whats best)



On Wed, Dec 13, 2017 at 6:39 AM, Roger Eller via use-livecode <
use-livecode@lists.runrev.com> wrote:

> I have a webpage that I grab with LiveCode, then parse out what I need.
> The data I keep is within the first 1/4th of the page.
>
> Rather than loading the entire page into a variable or a browser object,
> how can I load just the portion that I need and then stop the transmission
> instead of wasting the time and bandwidth to load the entire page?
>
> ~Roger
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Website scraping - How can I load a 'partial' page?

2017-12-13 Thread Roger Eller via use-livecode
I have a webpage that I grab with LiveCode, then parse out what I need.
The data I keep is within the first 1/4th of the page.

Rather than loading the entire page into a variable or a browser object,
how can I load just the portion that I need and then stop the transmission
instead of wasting the time and bandwidth to load the entire page?

~Roger
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode