Op 21-06-15 om 22:04 schreef Joshua Valdez:
I'm having trouble making this script work to scrape information from a
series of Wikipedia articles.

What I'm trying to do is iterate over a series of wiki URLs and pull out
the page links on a wiki portal category (e.g.
https://en.wikipedia.org/wiki/Category:Electronic_design).
Instead of scraping the webpage, I'd have a look at the API. This might give much better and more reliable results than to rely on parsing HTML.

https://www.mediawiki.org/wiki/API:Main_page

You can try out the huge amount of different options (with small descriptions) on the sandbox page:

https://en.wikipedia.org/wiki/Special:ApiSandbox

Timo





*Joshua Valdez*
*Computational Linguist : Cognitive Scientist
      *

(440)-231-0479
jd...@case.edu <j...@uw.edu> | j...@uw.edu | jo...@armsandanchors.com
<http://www.linkedin.com/in/valdezjoshua/>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to