[Wikitech-l] Wikipedia Dump

2011-02-03 Thread zh509
Dear All, 

I have used two dumps from english Wikipedia as below, the count results 
turn out like this, Would you please let me know which one is completed and 
can be analyzed? and I am confused why the 2001-2009 had different number? 
Thanks very much !!

select count (1), to_char(rev_timestamp,'') from enwiki.revision group 
by to_char(rev_timestamp,'') order by (to_char(rev_timestamp,''))


resource is : 
http://download.wikimedia.org/enwiki/20100130/enwiki-20100130-stub-meta-history.xml.gz

+--+-+
| count(1) | year(rev_timestamp) |
+--+-+
|57559 |2001 |
|   616878 |2002 |
|  1598363 |2003 |
|  6999869 |2004 |
| 20697477 |2005 |
| 57214741 |2006 |
| 75235972 |2007 |
| 74757575 |2008 |
| 70600627 |2009 |
|  6017974 |2010 |
+--+-+


 
resource is : 
http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-stub-meta-history.xml.gz

 64305  2001
 616257 2002
 15966122003
 69794942004
 20642853   2005
 57043694   2006
 74936692   2007
 74387391   2008
 70085652   2009
 53054853   2010

-
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l






___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia Dump

2011-01-28 Thread zh509
Dear all,

I have used the dump from 
http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-stub-meta-history.xml.gz,
 
imported into sql database.

However, I could see any data on 2001 to 2004, anyone know what's wrong?

thanks,

Zeyi
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] mwdumper data

2010-10-29 Thread zh509
Dear All,

I have used the mwdumper to convert compressed wikipedia dumps. But I have 
Oracle to use. I don't know enough about databases to know, but if a dump 
in mysql or postgresql format were generated, could it be converted for use 
with Orcale? Thanks,

Zeyi



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Get full-protecting article list monthly

2010-09-14 Thread zh509
On Sep 14 2010, Roan Kattouw wrote:

2010/9/14  zh...@york.ac.uk:
 Dear All,

 May I ask how I can get the full-protecting article lists monthly? I can
 get the current one by searching lock link. Is that some tools for this?

  
 http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max
  
 returns the first 500 (or 5,000 if you're a bot or sysop) pages in the 
 main namespace that only sysops can edit (i.e. that are fully protected). 
 If you're not a privileged user and only get 500 entries, you can use the 
 information in the query-continue tag to get the next 500. Roan Kattouw 
 (Catrope)

Thanks for this!

I am looking for the change of full-protecting articles. May I get the data 
with time or date something? is that possible? Thanks,

Zeyi


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Get full-protecting article list monthly

2010-09-14 Thread zh509
Hi, the link seems not work. 

best,

Zeyi

On Sep 14 2010, John Doe wrote:

 Better yet, 
 http://toolserver.org/~betacommand/reports/sysopprotecton.txtwhich is 
 updated daily,

Δ


 On Tue, Sep 14, 2010 at 9:08 AM, John Doe phoenixoverr...@gmail.com 
 wrote:

 If you want we have a toolserver database query service, and generating
 such data should be easy if you file a request
 https://jira.toolserver.org/browse/DBQ you should be able to get the data
 you need.

 Δ



 On Tue, Sep 14, 2010 at 8:59 AM, zh...@york.ac.uk wrote:

 On Sep 14 2010, Roan Kattouw wrote:

 2010/9/14  zh...@york.ac.uk:
  Dear All,
 
  May I ask how I can get the full-protecting article lists monthly? I
 can
  get the current one by searching lock link. Is that some tools for
 this?
 
 
 
  
  
 http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max
  returns the first 500 (or 5,000 if you're a bot or sysop) pages in the
  main namespace that only sysops can edit (i.e. that are fully
 protected).
  If you're not a privileged user and only get 500 entries, you can use
 the
  information in the query-continue tag to get the next 500. Roan 
  Kattouw (Catrope)

 Thanks for this!

 I am looking for the change of full-protecting articles. May I get the
 data
 with time or date something? is that possible? Thanks,

 Zeyi

 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Get full-protecting article list monthly

2010-09-13 Thread zh509
Dear All,

May I ask how I can get the full-protecting article lists monthly? I can 
get the current one by searching lock link. Is that some tools for this?

thanks,

Zeyi


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] import data through MWDumper

2010-05-25 Thread zh509
Hi, all,

I try to use MWDumper to import data, however, the importer only has two 
choise for Mysql and PostGreSQL. What I am supposed to do if I want import 
Wikipedia data into Oracle database?

Thanks very much.

Zeyi

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] question about user_group dump

2010-05-22 Thread zh509
Hi, everyone,

Thanks for reading. 

I am a sociological research. I have used the dump of Wikipedia English 
enwiki-20100312-user_groups.sql for my research. I am confused by 
'accountcreator' 'founder' and 'confirmed' meaning, would you please to 
introduce?

As I know, users of wikipedia can change their status by becoming helper, 
admin or joining other groups. Since when, the user groups like this data 
showed, the date this date collected? How I am supposed to do if I want the 
data showing the change of user status?

Thanks very much for help!

Zeyi


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

2010-03-19 Thread zh509
On Mar 19 2010, Platonides wrote:

Zeyi wrote:
 Hi,

 Firstly, congratulations for this! as i Know it has taken for a long 
 time!

 and May I ask a small question: what difference between current dump and
 history dump. I know current one only includes current edits, and history
 one has all edits as introduction said.

You have explained the difference perfectly :)

 More specifically, how different
 shows on one article? Can anyone explain it in detail, please?

It doesn't show the article. It's just a really really large bunch of 
wikitext separated by xml tags.
It is shown by a tool. If you just wwant to read the articles, you don't 
need histories.

What I mean is that if the current dump show there are 30 edits under the 
particular article name, and history dump show there are 100 edits under 
the same article. what's different between these 30 and 100?

If i say that the current dump can explain how the current articles 
established from different edits, is that correct?

 Additionally, why all the statistics of Wikipedia only use history dump 
 for analysis?

Because they study things like changes made to articles, number of edits 
per time...

 Thanks very much!

You're welcome.



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

2010-03-19 Thread zh509
On Mar 19 2010, Conrad Irwin wrote:


On 03/19/2010 11:02 AM, zh...@york.ac.uk wrote:

 What I mean is that if the current dump show there are 30 edits under 
 the particular article name, and history dump show there are 100 edits 
 under the same article. what's different between these 30 and 100?

The current dump shows 1 edit for each article, only the most recent at
the time that article was processed. The history dump shows all edits
for all articles.

Wow, can you confirm that only the lastest edit can be collected by the 
current dump? So, the current dump isn't meaningful in the term of 
statistics?


Conrad
thanks,
Zeyi
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

2010-03-18 Thread zh509
Hi, 

Firstly, congratulations for this! as i Know it has taken for a long time!

and May I ask a small question: what difference between current dump and 
history dump. I know current one only includes current edits, and history 
one has all edits as introduction said. More specifically, how different 
shows on one article? Can anyone explain it in detail, please?

Additionally, why all the statistics of Wikipedia only use history dump for 
analysis?

Thanks very much!

Zeyi


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] query data from Wikipedia dump?

2010-03-09 Thread zh509
I don't know if here is the space to ask this. I heard there are some web 
where people voluntarily help on querying data list from Wikipedia dump 
and solve statistic problem.

May I ask is it still exist? If so, how can i find it?

best, 

Zeyi





___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia database

2009-11-23 Thread zh509
Thanks. but is that page_latest is unique in page table?

On Nov 21 2009, Roan Kattouw wrote:

2009/11/21  zh...@york.ac.uk:
 I need use rev_user and page_namespace to do crossing-analysis. How i can
 put them in the one table? thanks again.

You don't need to put them in one table, just use a query with a JOIN.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia database

2009-11-21 Thread zh509
On Nov 20 2009, Platonides wrote:

Zeyi wrote:
 I took the sub-current data from MediaWiki and import them to Oracle. 
Which tool did you use for the import?

I used xml2sql tool, which is easy to use. 

 I found there are two same page_latest ID in the page table. Then when 
 I tried to join Revision table and Page table together, this caused two 
 same rev_id.

Which pages are those?
kinds of every pages, is that page_latest ID unique? 


 May I ask why I have two page_latest on page table, what it mean? If I 
 want to put Revision table and Page table together, which should be the 
 link point?

You shouldn't have that situation.
And why are you merging page and revision, anyway?

I need use rev_user and page_namespace to do crossing-analysis. How i can 
put them in the one table? thanks again.

 thanks,
 Zeyi


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Wikipedia database

2009-11-19 Thread zh509
Greeting,

May I ask the question about wikipedia database. I downloaded the Wikipedia 
revision current data. and found there are some records have the exactly 
same rev_id, rev_user and same timestamp. What does it mean? are they the 
same edit or different?

best,

Zeyi

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia database

2009-11-19 Thread zh509
On Nov 19 2009, Roan Kattouw wrote:

2009/11/19  zh...@york.ac.uk:
 Greeting,

 May I ask the question about wikipedia database. I downloaded the 
 Wikipedia revision current data. and found there are some records have 
 the exactly same rev_id, rev_user and same timestamp. What does it mean? 
 are they the same edit or different?

If they belong to the same wiki, they're very likely to be the same
edit. Of course such duplicates should theoretically not occur.

Roan Kattouw (Catrope)


Thanks, I noted that because i add Revision Table and Page table together. 
May I ask why for the same page.page_latest, there are two same records on 
the table? Is that the link between revision and Page is the 
rev_id=page.page_latest?

thanks. 

Zeyi

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] 5 millions

2009-09-02 Thread zh509
Hi, everyone,

Wikimedia Commons, the media repository site used by Wikipedia, today just 
reached the 5 million media files milestone. Every one of these media files 
is available under a free license, such that anyone can use them for any 
purpose. Wikimedia Commons is the largest free media repository on the 
internet.

Zeyi He 

Wikimedia UK

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l