Op 21-06-15 om 22:04 schreef Joshua Valdez:
I'm having trouble making this script work to scrape information from a
series of Wikipedia articles.
What I'm trying to do is iterate over a series of wiki URLs and pull out
the page links on a wiki portal category (e.g.
https://en.wikipedia.org/wiki/Category:Electronic_design).
Instead of scraping the webpage, I'd have a look at the API. This might
give much better and more reliable results than to rely on parsing HTML.
https://www.mediawiki.org/wiki/API:Main_page
You can try out the huge amount of different options (with small
descriptions) on the sandbox page:
https://en.wikipedia.org/wiki/Special:ApiSandbox
Timo
*Joshua Valdez*
*Computational Linguist : Cognitive Scientist
*
(440)-231-0479
jd...@case.edu <j...@uw.edu> | j...@uw.edu | jo...@armsandanchors.com
<http://www.linkedin.com/in/valdezjoshua/>
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor