| GoranSMilovanovic added a comment. |
@Jan_Dittrich Here we go:
– 1.1. "Q1.1 Checking for power-law behavior" has two log scaled axis, if I read it correctly. I do not get what the numerical labels on the y axis mean – is this number of users, but after log transformation, so 10000 users becomes 9.21… on the log scale? The log is a natural logarithm, correct? (2.718281828459…)
Correct. Both the number of revisions (x-axis) and the number of users who made the respective number of revisions (y-axis) where log-transformed (it's a natural log indeed, base = e), and the plotted against each other.
– 1.2. Q1.2 and Q1.3 show the same data as Q1.1, but in natural numbers without transformations,
Yes. Except that Q1.3 is misnomed: it should be titled Q1.3 Histogram: Distribution of the the number of users across revisions. But yes, both Q1.2 and Q1.3 sections deal with the same data as Q1.1, except that they operate on the natural (count) scale, not on log-transforms.
– 1.3. The diagrams tell "We have many accounts that edited a few times and a small amount of accounts which edit(ed) a lot"
Exactly.
– 1.4. To clarify, this is how many revisions a specific account has created until the data was acquired? (I call "revisions" "edits", it seems then)
That is true. Q1.3 is the distribution of the number of users who have made a particular number of revisions (edits, if you prefer), up to the moment of data acquisition for this analysis.
– 2.1 Q2.x are like 1.x, but only for september 2018.
Yes. Initially, you asked for an overview of the last 30 days or so. However, the wmf.mediawiki_history Hadoop table, where the data for this analysis is found, is partitioned per snapshots, where a snapshot represents all the data up to the particular month (e.g. everything since the beginning of history up to August 2018, September 2018, October 2018, etc). At the time when I've run the analysis, the September snapshot was the freshest one available. So you don't see any of the October 2018 data here, for example. We will not be able to do October 2018 before November 2018; it takes some time for that table to update.
– 2.2 The diagrams tell "recent edit counts by user follow the same pattern (log) as the general edit count distribution"
Q2.2 does not use any log scaling (as well as the corresponding Q1.2), but yes, it tells us that in qualitative terms the distribution of the number of users who have made a particular number of edits is no different from what we've seen when we've analyzed all of the data in Q1 sections. One could also compare Q2.3 to Q1.3 and bring about the same judgment by eye-balling only. I am not very enthusiastic about going into strict statistical, model-based comparisons between the two distributions here, but if you really want to make sure whether the case is true as stated here... let me know.
– 3. Q3 The diagrams are indeed tricky to read. I would read the crosstabular one like a scatterplot, however, in this case, it would need jitter, would it? Maybe a "heatmap" like approach might also be OK: A 2-D-bin would be darker if more edit/discussion counts fall into it.
As you have already observed it did cause a bit of a headache to me too. Please allow me some time to figure out the most informative approach to this visualization and as soon as I have it you will have it too. Thank you.
– 4. Q4.2 is again a diagram I am unable to read well. Could it be that the labels are off? X seems to show dates, but y seems to show dates (year-month), too, but log scaled? So I could not make sense of it.
Oh, oh, oh: sorry. The axes are off: x represents Month-Year, y represents log(number of Revisions). I will correct this as soon as I am done with the tricky Q3 diagrams.
Please let me know if you have any additional questions. Thank you for you feedback!
Cc: Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, Jonas, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
