simple download manager

2014-11-04 Thread Kiuhnm
I wish to automate the downloading from a particular site which has some ADs 
and which requires to click on a lot of buttons before the download starts.

What library should I use to handle HTTP?
Also, I need to support big files ( 1 GB) so the library should hand the data 
to me chunk by chunk.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: simple download manager

2014-11-04 Thread Chris Angelico
On Wed, Nov 5, 2014 at 1:53 AM, Kiuhnm gandal...@mail.com wrote:
 I wish to automate the downloading from a particular site which has some ADs 
 and which requires to click on a lot of buttons before the download starts.

 What library should I use to handle HTTP?
 Also, I need to support big files ( 1 GB) so the library should hand the 
 data to me chunk by chunk.

You may be violating the site's terms of service, so be aware of what
you're doing.

This could be a really simple job (just figure out what the last HTTP
query is, and replicate that), or it could be insanely complicated
(crypto, JavaScript, and/or timestamped URLs could easily be
involved). To start off, I would recommend not writing a single like
of Python code, but just pulling up Mozilla Firefox with Firebug, or
Google Chrome with in-built inspection tools, or some equivalent, and
watching the exact queries that go through. Once you figure out what
queries are happening, you can figure out how to do them in Python.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: simple download manager

2014-11-04 Thread Kiuhnm
On Tuesday, November 4, 2014 4:00:51 PM UTC+1, Chris Angelico wrote:
 On Wed, Nov 5, 2014 at 1:53 AM, Kiuhnm gandal...@mail.com wrote:
  I wish to automate the downloading from a particular site which has some 
  ADs and which requires to click on a lot of buttons before the download 
  starts.
 
  What library should I use to handle HTTP?
  Also, I need to support big files ( 1 GB) so the library should hand the 
  data to me chunk by chunk.
 
 You may be violating the site's terms of service, so be aware of what
 you're doing.
 
 This could be a really simple job (just figure out what the last HTTP
 query is, and replicate that), or it could be insanely complicated
 (crypto, JavaScript, and/or timestamped URLs could easily be
 involved). To start off, I would recommend not writing a single like
 of Python code, but just pulling up Mozilla Firefox with Firebug, or
 Google Chrome with in-built inspection tools, or some equivalent, and
 watching the exact queries that go through. Once you figure out what
 queries are happening, you can figure out how to do them in Python.
 
 ChrisA

It'll be tricky. I'm sure of that, but if the browser can do it, so can I :)
Fortunately, there are no captchas.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: simple download manager

2014-11-04 Thread Kiuhnm
On Tuesday, November 4, 2014 4:10:59 PM UTC+1, Kiuhnm wrote:
 On Tuesday, November 4, 2014 4:00:51 PM UTC+1, Chris Angelico wrote:
  On Wed, Nov 5, 2014 at 1:53 AM, Kiuhnm gandal...@mail.com wrote:
   I wish to automate the downloading from a particular site which has some 
   ADs and which requires to click on a lot of buttons before the download 
   starts.
  
   What library should I use to handle HTTP?
   Also, I need to support big files ( 1 GB) so the library should hand the 
   data to me chunk by chunk.
  
  You may be violating the site's terms of service, so be aware of what
  you're doing.
  
  This could be a really simple job (just figure out what the last HTTP
  query is, and replicate that), or it could be insanely complicated
  (crypto, JavaScript, and/or timestamped URLs could easily be
  involved). To start off, I would recommend not writing a single like
  of Python code, but just pulling up Mozilla Firefox with Firebug, or
  Google Chrome with in-built inspection tools, or some equivalent, and
  watching the exact queries that go through. Once you figure out what
  queries are happening, you can figure out how to do them in Python.
  
  ChrisA
 
 It'll be tricky. I'm sure of that, but if the browser can do it, so can I :)
 Fortunately, there are no captchas.

There are no captcha but the site is behind cloudflare (DDOS protection).
Anyway, I now know what to do. To deal with cloudflare's javascript challenge 
I'm going to use jsdb, a neat little javascript interpreter.
By the way, I'm using requests instead of urllib, but I need to figure out how 
to download and write to disk big files.
-- 
https://mail.python.org/mailman/listinfo/python-list