李琴 wrote:
> Hi all,
>   I have  built a LocalWiki.   Now I want the data of it to keep consistent 
> with the 
> Wikipedia and one work I should do is to get the data of update from 
> Wikipedia.   
> I get the URLs through analyzing the RSS
> (http://zh.wikipedia.org/w/index.php?title=Special:%E6%9C%80%E8%BF%91%E6%9B%B4%E6%94%B9&feed=rss)
>  
> and get all HTML content of the edit box by analyzing 
> these URLs after opening an URL and clicking the ’edit this page’.  
> (eg: 
> http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%81%8A%E6%88%B2%E7%AF%80%E7%9B%AE)&diff=12199398&oldid=prev
>  
> and its edit interface is 
> http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%81%8A%E6%88%B2%E7%AF%80%E7%9B%AE)&action=edit
>  
> .   However, I encounter two problems during my work.
> Firstly, sometimes I can’t open a URL which is from the RSS and I don’t 
> know why.   
> That’s because I visit it too frequently and my IP address is prohibited 
> or the network is too slow?
>   If the reason is the former, how often can I visit a page of Wikipedia?   
> Is there a timeout?
> Secondly, just as mentioned before
> I want to download all HTML of the content in the edit box from Wikipedia, 
> however, 
> I can do sometimes but other times I just can download part of it, what’s 
> the reason?
> 
> Thanks
> 
> vanessa

Using the api or special:export you can request several pages per http
request, which is nicer to the system. You should also add a maxlag
parameter.
Obviously you must put a proper User-Agent, so that if your bot causes
issues you can be contacted/banned.

Wikimedia Foundation offers a live feed to keep the wikis up-to-date,
check <http://meta.wikimedia.org/wiki/Wikimedia_update_feed_service>


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to