[Wikitech-l] Wikipedia Dump
Dear All, I have used two dumps from english Wikipedia as below, the count results turn out like this, Would you please let me know which one is completed and can be analyzed? and I am confused why the 2001-2009 had different number? Thanks very much !! select count (1), to_char(rev_timestamp,'') from enwiki.revision group by to_char(rev_timestamp,'') order by (to_char(rev_timestamp,'')) resource is : http://download.wikimedia.org/enwiki/20100130/enwiki-20100130-stub-meta-history.xml.gz +--+-+ | count(1) | year(rev_timestamp) | +--+-+ |57559 |2001 | | 616878 |2002 | | 1598363 |2003 | | 6999869 |2004 | | 20697477 |2005 | | 57214741 |2006 | | 75235972 |2007 | | 74757575 |2008 | | 70600627 |2009 | | 6017974 |2010 | +--+-+ resource is : http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-stub-meta-history.xml.gz 64305 2001 616257 2002 15966122003 69794942004 20642853 2005 57043694 2006 74936692 2007 74387391 2008 70085652 2009 53054853 2010 - Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia Dump
Dear all, I have used the dump from http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-stub-meta-history.xml.gz, imported into sql database. However, I could see any data on 2001 to 2004, anyone know what's wrong? thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] mwdumper data
Dear All, I have used the mwdumper to convert compressed wikipedia dumps. But I have Oracle to use. I don't know enough about databases to know, but if a dump in mysql or postgresql format were generated, could it be converted for use with Orcale? Thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Get full-protecting article list monthly
On Sep 14 2010, Roan Kattouw wrote: 2010/9/14 zh...@york.ac.uk: Dear All, May I ask how I can get the full-protecting article lists monthly? I can get the current one by searching lock link. Is that some tools for this? http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max returns the first 500 (or 5,000 if you're a bot or sysop) pages in the main namespace that only sysops can edit (i.e. that are fully protected). If you're not a privileged user and only get 500 entries, you can use the information in the query-continue tag to get the next 500. Roan Kattouw (Catrope) Thanks for this! I am looking for the change of full-protecting articles. May I get the data with time or date something? is that possible? Thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Get full-protecting article list monthly
Hi, the link seems not work. best, Zeyi On Sep 14 2010, John Doe wrote: Better yet, http://toolserver.org/~betacommand/reports/sysopprotecton.txtwhich is updated daily, Δ On Tue, Sep 14, 2010 at 9:08 AM, John Doe phoenixoverr...@gmail.com wrote: If you want we have a toolserver database query service, and generating such data should be easy if you file a request https://jira.toolserver.org/browse/DBQ you should be able to get the data you need. Δ On Tue, Sep 14, 2010 at 8:59 AM, zh...@york.ac.uk wrote: On Sep 14 2010, Roan Kattouw wrote: 2010/9/14 zh...@york.ac.uk: Dear All, May I ask how I can get the full-protecting article lists monthly? I can get the current one by searching lock link. Is that some tools for this? http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max returns the first 500 (or 5,000 if you're a bot or sysop) pages in the main namespace that only sysops can edit (i.e. that are fully protected). If you're not a privileged user and only get 500 entries, you can use the information in the query-continue tag to get the next 500. Roan Kattouw (Catrope) Thanks for this! I am looking for the change of full-protecting articles. May I get the data with time or date something? is that possible? Thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Get full-protecting article list monthly
Dear All, May I ask how I can get the full-protecting article lists monthly? I can get the current one by searching lock link. Is that some tools for this? thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] import data through MWDumper
Hi, all, I try to use MWDumper to import data, however, the importer only has two choise for Mysql and PostGreSQL. What I am supposed to do if I want import Wikipedia data into Oracle database? Thanks very much. Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] question about user_group dump
Hi, everyone, Thanks for reading. I am a sociological research. I have used the dump of Wikipedia English enwiki-20100312-user_groups.sql for my research. I am confused by 'accountcreator' 'founder' and 'confirmed' meaning, would you please to introduce? As I know, users of wikipedia can change their status by becoming helper, admin or joining other groups. Since when, the user groups like this data showed, the date this date collected? How I am supposed to do if I want the data showing the change of user status? Thanks very much for help! Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D
On Mar 19 2010, Platonides wrote: Zeyi wrote: Hi, Firstly, congratulations for this! as i Know it has taken for a long time! and May I ask a small question: what difference between current dump and history dump. I know current one only includes current edits, and history one has all edits as introduction said. You have explained the difference perfectly :) More specifically, how different shows on one article? Can anyone explain it in detail, please? It doesn't show the article. It's just a really really large bunch of wikitext separated by xml tags. It is shown by a tool. If you just wwant to read the articles, you don't need histories. What I mean is that if the current dump show there are 30 edits under the particular article name, and history dump show there are 100 edits under the same article. what's different between these 30 and 100? If i say that the current dump can explain how the current articles established from different edits, is that correct? Additionally, why all the statistics of Wikipedia only use history dump for analysis? Because they study things like changes made to articles, number of edits per time... Thanks very much! You're welcome. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D
On Mar 19 2010, Conrad Irwin wrote: On 03/19/2010 11:02 AM, zh...@york.ac.uk wrote: What I mean is that if the current dump show there are 30 edits under the particular article name, and history dump show there are 100 edits under the same article. what's different between these 30 and 100? The current dump shows 1 edit for each article, only the most recent at the time that article was processed. The history dump shows all edits for all articles. Wow, can you confirm that only the lastest edit can be collected by the current dump? So, the current dump isn't meaningful in the term of statistics? Conrad thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D
Hi, Firstly, congratulations for this! as i Know it has taken for a long time! and May I ask a small question: what difference between current dump and history dump. I know current one only includes current edits, and history one has all edits as introduction said. More specifically, how different shows on one article? Can anyone explain it in detail, please? Additionally, why all the statistics of Wikipedia only use history dump for analysis? Thanks very much! Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] query data from Wikipedia dump?
I don't know if here is the space to ask this. I heard there are some web where people voluntarily help on querying data list from Wikipedia dump and solve statistic problem. May I ask is it still exist? If so, how can i find it? best, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia database
Thanks. but is that page_latest is unique in page table? On Nov 21 2009, Roan Kattouw wrote: 2009/11/21 zh...@york.ac.uk: I need use rev_user and page_namespace to do crossing-analysis. How i can put them in the one table? thanks again. You don't need to put them in one table, just use a query with a JOIN. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia database
On Nov 20 2009, Platonides wrote: Zeyi wrote: I took the sub-current data from MediaWiki and import them to Oracle. Which tool did you use for the import? I used xml2sql tool, which is easy to use. I found there are two same page_latest ID in the page table. Then when I tried to join Revision table and Page table together, this caused two same rev_id. Which pages are those? kinds of every pages, is that page_latest ID unique? May I ask why I have two page_latest on page table, what it mean? If I want to put Revision table and Page table together, which should be the link point? You shouldn't have that situation. And why are you merging page and revision, anyway? I need use rev_user and page_namespace to do crossing-analysis. How i can put them in the one table? thanks again. thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Wikipedia database
Greeting, May I ask the question about wikipedia database. I downloaded the Wikipedia revision current data. and found there are some records have the exactly same rev_id, rev_user and same timestamp. What does it mean? are they the same edit or different? best, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia database
On Nov 19 2009, Roan Kattouw wrote: 2009/11/19 zh...@york.ac.uk: Greeting, May I ask the question about wikipedia database. I downloaded the Wikipedia revision current data. and found there are some records have the exactly same rev_id, rev_user and same timestamp. What does it mean? are they the same edit or different? If they belong to the same wiki, they're very likely to be the same edit. Of course such duplicates should theoretically not occur. Roan Kattouw (Catrope) Thanks, I noted that because i add Revision Table and Page table together. May I ask why for the same page.page_latest, there are two same records on the table? Is that the link between revision and Page is the rev_id=page.page_latest? thanks. Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] 5 millions
Hi, everyone, Wikimedia Commons, the media repository site used by Wikipedia, today just reached the 5 million media files milestone. Every one of these media files is available under a free license, such that anyone can use them for any purpose. Wikimedia Commons is the largest free media repository on the internet. Zeyi He Wikimedia UK ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l