On 08/04/16 01:51, Jason Willis wrote: > Though, I do know some things and can figure a little bit out when looking > at source code I'm usually at a loss when understanding the entire workings > of a program.
And thats the problem here. The code is very specific to the page it is parsing. Simply substituting a different file will never work. For example... > DOC_ROOT = 'http://freeproxylists.com' > ELITE_PAGE = 'elite.html' > def _extract_ajax_endpoints(self): > > ''' make a GET request to freeproxylists.com/elite.html ''' > url = '/'.join([DOC_ROOT, ELITE_PAGE]) > response = requests.get(url) > > ''' extract the raw HTML doc from the response ''' > raw_html = response.text > > ''' convert raw html into BeautifulSoup object ''' > soup = BeautifulSoup(raw_html) > > for url in soup.select('table tr td table tr td a'): > if 'elite #' in url.text: > yield '%s/load_elite_d%s' % (DOC_ROOT, > url['href'].lstrip('elite/')) Notice that last 'if' section has 'elite #' hard coded in. But the standard page doesn't use 'elite #'... There are probably a lot more similar content-dependant things in the code, I just happened to spot that one. It would be better if you took the time(only a few hours really) to learn how to program in Python so that you can actually understand the code rather than making "poke 'n hope" changes. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor