[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #3 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Chris55 mosch...@googlemail.com
Date: Wed, 23 Mar 2011 21:16:03
---

Not interested in IP edits, so rev_user!=0 and so I don't need user names - but
I do need the month in the output.  
There are about 650,000 current users, growing approximately linearly since
2005, so that makes 23m possible entries; but probably most ids only appear in
a single month, so the actual output will be much smaller, say 2m. *20
char/record=40mb output, compressed.  
Can we compromise on yearly batches? I'm trying to nail down the problems with
the growth of users mainly in the period 2005-8, so maybe yearly from 2005 with
2001-4 in one batch? But if you have to take smaller bites, so be it.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #11 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Chris55 mosch...@googlemail.com
Date: Thu, 31 Mar 2011 15:34:24
---

Hmm, it's obviously getting harder ![][1]  
It seems that the whole db is getting too large to get proper stats these days.
It's impossible to get any detail beyond the start of 2010. So - yes, why don't
we call it a day at 2010/02?

   [1]: https://jira.toolserver.org/images/icons/emoticons/sad.gif

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #17 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Hoo man h...@online.de
Date: Fri, 01 Apr 2011 14:38:35
---

I've just noticed the following on the imported data: the full.txt file didn't
include the two 2010 months ![][1]  
I've updated it now ![][2]

(The script I used to merge them
http://toolserver.org/~hoo/dbq/dbq-129/merge.sh used **200** as pattern to find
the files, but the 2010 files aren't including 200 in their names. I've now
changed the script)

   [1]: https://jira.toolserver.org/images/icons/emoticons/sad.gif
   [2]: https://jira.toolserver.org/images/icons/emoticons/wink.gif

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #13 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Chris55 mosch...@googlemail.com
Date: Fri, 01 Apr 2011 13:04:32
---

Thanks, I've downloaded it and it looks fine.  
However I asked that it wouldn't include IP edits but it appears to and I have
no way of eliminating them.  
The stats say that by Feb 2010 there were 573,891 Wikipedians but the file
includes 3,328,228 user ids for 7,704,721 monthly records. Looking at the id's,
there are a lot of 8 digit ids even in 2001 which most appear for very short
periods.  
Since I'm particularly interested in people who've created ids but never made
more than a few edits it's crucial that I can eliminate IP edits. Is there any
way to do it with the file you've given me?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #14 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Hoo man h...@online.de
Date: Fri, 01 Apr 2011 13:17:10
---

The files seem to be correct, all IP edits have rev_user = 0. The strange
user_id thingy seems fine as well eg. user 11419925 who made 2 edits in
2001-02:

mysql SELECT * FROM user WHERE user_id = '11419925';
+--+---+---++
| user_id  | user_name | user_registration | user_editcount |
+--+---+---++
| 11419925 | Cobrand.bomis.com | 20100112120114|  0 |
+--+---+---++
1 row in set (0.02 sec)


And on wiki you can see (two edits in ZeuS):  
http://en.wikipedia.org/wiki/Special:Contributions/Cobrand.bomis.com

Things like this are caused by imports from the old wikipedia eg. for the
example above the edits are from:
http://nostalgia.wikipedia.org/w/index.php?title=ZeuSaction=history  
such edits are linked to users with the same name, even if they registered
years later ![][1]

   [1]: https://jira.toolserver.org/images/icons/emoticons/wink.gif

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #6 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Hoo man h...@online.de
Date: Tue, 29 Mar 2011 12:47:27
---

Be sure, I know what I'm doing (I'm not using SQL the first time ![][1])...
running it in one would take days and we have an active query killer, so that
queries aren't usually able to run for more than 5-6 hours, further that would
eat a lot of ram ![][2]

I'm currently at 2008-09 and I'm confident that I can give you the results
tomorrow (in one big file and in smaller per month files)

Sorry, that it took so long, but I first choose the wrong (a slower) DB server,
so I lost a lot of time ![][3]

   [1]: https://jira.toolserver.org/images/icons/emoticons/tongue.gif
   [2]: https://jira.toolserver.org/images/icons/emoticons/wink.gif
   [3]: https://jira.toolserver.org/images/icons/emoticons/sad.gif

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #5 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Chris55 mosch...@googlemail.com
Date: Tue, 29 Mar 2011 12:20:20
---

Have you any idea how long this might take? I presume no partial results will
be available.  
I wonder slightly at the tactic of splitting the task up too finely: it's a
straight query without joins and doing it in too small chunks (ie 120 cf 10)
might lead to a significant increase in overall runtime.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #9 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Chris55 mosch...@googlemail.com
Date: Thu, 31 Mar 2011 14:29:40
---

Is it possible to release the early years now? I had thought of putting in a
report for the UK Wiki conference and the deadline is 2 Apr so I need to do at
least the 2007 analysis soon.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #1 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Chris55 mosch...@googlemail.com
Date: Tue, 22 Mar 2011 23:35:23
---

distinguish IP  registered users

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #2 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Hoo man h...@online.de
Date: Wed, 23 Mar 2011 19:08:33
---

So I'm understanding you right, you want it this way:  
user_id, user_name, edits  
for every month from the start of enwiki on?  
(user_name is required if you want IP edits as well, cause IPs don't have a
user_id)

If so, you have to be patient, cause this will take quite a while (I have to
run it per month to not kill the server ![][1])  
Further I will have a separate file for every month, if that's ok

   [1]: https://jira.toolserver.org/images/icons/emoticons/tongue.gif

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #15 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Chris55 mosch...@googlemail.com
Date: Fri, 01 Apr 2011 13:56:59
---

Hmm, the data is a lot more noisy than I'd hoped. a
href=http://en.wikipedia.org/wiki/Special:Statistics;This page/a says there
are 14m registered users and the number can't surely have quadrupled in the
last year. It says 453m edits whereas the sum in the file you've given me is
230m which is nearer to being likely but still not reassuring. The edits by top
users are nothing like the published figures (starting at 2.8m) but I'll guess
those don't include bots.  
Oh well, I'll continue to see if I can get anything meaningful out...

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #4 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Hoo man h...@online.de
Date: Thu, 24 Mar 2011 15:03:26
---

I just wanted to let you know, that I've started the script ![][1]

I'll let you know when it's finished ![][2]

   [1]: https://jira.toolserver.org/images/icons/emoticons/wink.gif
   [2]: https://jira.toolserver.org/images/icons/emoticons/smile.gif

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #12 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Hoo man h...@online.de
Date: Fri, 01 Apr 2011 11:45:03
---

A few more days and it would been finished...

Code (wrote it in php cause I was to lazy for bash ![][1], but that doesn't
matter):

#!/usr/bin/php -q
?php
$current = '201103';
$month =  '05';
$year = '2007';
$log_file = '/mnt/user-store/hoo/dbq-129.log.txt';
$save_file_pre = '/mnt/user-store/hoo/dbq/dbq-129-';
$db = 'enwiki_p';
$server = '10.24.1.33';
while(true) {
$month++;
if($month  12) {
$month = '01';
$year++;
}
if(strlen($month) == 1) {
$month = '0' . $month;
}
if($current  $year . $month) {
break;
}
while(true) {
$query = SELECT /* SLOW_OK */ rev_user, COUNT(*) as edits,
LEFT(rev_timestamp, 6) as month FROM revision WHERE rev_timestamp LIKE ' .
$year . $month . %' AND rev_user != 0 GROUP BY rev_user;;
$file = $save_file_pre . $year . '-' . $month . '.txt';
exec('mysql --host=' . $server . ' --database=' . $db . '
-e' . $query . ' | cat  ' . $file);
if(filesize($file) != 0) {
break;
}
}
}
?


Result:  
http://toolserver.org/~hoo/dbq/dbq-129/ (dir with the per month files)  
http://toolserver.org/~hoo/dbq/dbq-129/full.txt (everything in one file (plain
text ~125MB), I haven't tested it, but I hope it's going to import fine ![][2]
)

   [1]: https://jira.toolserver.org/images/icons/emoticons/tongue.gif
   [2]: https://jira.toolserver.org/images/icons/emoticons/wink.gif

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #7 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Chris55 mosch...@googlemail.com
Date: Tue, 29 Mar 2011 13:15:38
---

Thanks - wasn't doubting your competence! I'm aware that estimating resource
requirements in large databases is a fine art.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #16 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Hoo man h...@online.de
Date: Fri, 01 Apr 2011 14:06:12
---

Well, of course the revision number differs because of three reasons: it ends
in 2010-02, we excluded IPs and the archive table (which includes the revision
of deleted pages) hasn't been included.  
And yes, bots are included (this is only avoidable with a join against the
user_groups table, which would be quite slow)

If you need it, I can give you a list of all bots, so that you can easily
exclude them out of your data...

I'm now going to import the data myself to my workstation just to take a short
look at the data

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #10 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Hoo man h...@online.de
Date: Thu, 31 Mar 2011 14:39:24
---

Sure, I can give you the results up to 2010-02 in a few hours (still have to re
run some sub queries because the query killer killed some of the earlier
queries) ![][1]

Do you still have a need for the rest of the data then (after April 2)?

   [1]: https://jira.toolserver.org/images/icons/emoticons/wink.gif

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

Bugzilla Bug Importer (valhallasw) wmf.bugconver...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #18 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---

This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: h...@online.de
CC list: h...@online.de

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 59384] DBQ-129 create a list of total individual user edits per month for English wikipedia

2014-01-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59384

--- Comment #8 from Bugzilla Bug Importer (valhallasw) 
wmf.bugconver...@gmail.com ---
---
From: Hoo man h...@online.de
Date: Thu, 31 Mar 2011 12:32:28
---

I'm sorry, but I've calculated the ETA wrong (I have thought that the number of
contributors would at maximum double after late 2005, but it grew even
further), so my script is only at 2010-02 atm.

Therefore I can't give you the results today ![][1]

I've recalculated the ETA now and my result is over 40 hours, therefore you
will have to wait till Saturday (April 2)

   [1]: https://jira.toolserver.org/images/icons/emoticons/sad.gif

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l