Re: [Tutor] FETCH URLs FROM WEBSITE

Alan Gauld Sat, 01 Aug 2015 10:44:24 -0700

On 01/08/15 11:48, Gaurav Lathwal wrote:

I want to write a script that automatically downloads all the videos hosted
on this site :-


http://www.toonova.com/batman-beyond


The first thing to ask is whether they allow robotic downloads
from the site. If they are funded by advertising then they may
not permit it and it would be self defeating to try since you
would be helping close down your source!

Now, the problem I am having is, I am unable to fetch the video urls of all
the videos.


I assume you want to fetch the videos not just the URLs?
Fetching the URLs is easy enough and I doubt the site would object
too strongly. But fetching the videos is much harder since:

a) The page you give only has links to separate pages for each
   video.
b) The separate pages have a download link which is to a
   tiny url which may well change.
c) The separate page is not static HTML (or even server
   generated HTML) it is created in part by the Javascript
   code when the page loads. That means it is very likely to
   change on each load (possibly deliberately so to foil robots!)

I mean I can manually fetch the video urls using the chrome developer's
console, but it's too time consuming.
Is there any way to just fetch all the video urls using BeautifulSoup ?

It's probably possible for a one-off, but it may not work reliably forfuture use. Assuming the site allows it in the first place.


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] FETCH URLs FROM WEBSITE

Reply via email to