[Wikidata-bugs] [Maniphest] [Commented On] T206214: Basic data on Wikidata use: Edit counts, frequency

2018-11-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.
@Jan_Dittrich Jan, please: do we need this ticket anymore? Thanks!TASK DETAILhttps://phabricator.wikimedia.org/T206214EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovicCc: Daniel_Mietchen, Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, D3r1ck01, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T206214: Basic data on Wikidata use: Edit counts, frequency

2018-11-01 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.
@Jan_Dittrich Any feedback in relation to this? Shall I close the task as resolved or do you have any further requests in relation to this dataset?TASK DETAILhttps://phabricator.wikimedia.org/T206214EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovicCc: Daniel_Mietchen, Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T206214: Basic data on Wikidata use: Edit counts, frequency

2018-11-01 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.
@Jan_Dittrich I did not attach this file in relation to T206214#4690469, didn't I?

F26791999: WD_edits_Notebook.nb.htmlTASK DETAILhttps://phabricator.wikimedia.org/T206214EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovicCc: Daniel_Mietchen, Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T206214: Basic data on Wikidata use: Edit counts, frequency

2018-10-25 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.
@Jan_Dittrich I did not attach this file in relation to T206214#4690469, didn't I?

F26791999: WD_edits_Notebook.nb.htmlTASK DETAILhttps://phabricator.wikimedia.org/T206214EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovicCc: Daniel_Mietchen, Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T206214: Basic data on Wikidata use: Edit counts, frequency

2018-10-23 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.
@Daniel_Mietchen My bad: I gave you the wrong Phab ticket for this. Sorry. Please: https://phabricator.wikimedia.org/T193969TASK DETAILhttps://phabricator.wikimedia.org/T206214EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovicCc: Daniel_Mietchen, Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T206214: Basic data on Wikidata use: Edit counts, frequency

2018-10-23 Thread Daniel_Mietchen
Daniel_Mietchen added a comment.
Thanks for the progress so far. I'd like to see the tool expanded to cover usage of Wikidata outside namespace 0 (including on Wikidata itself) and in SPARQL queries/ API calls.TASK DETAILhttps://phabricator.wikimedia.org/T206214EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovic, Daniel_MietchenCc: Daniel_Mietchen, Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T206214: Basic data on Wikidata use: Edit counts, frequency

2018-10-23 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.
@Jan_Dittrich Here we go:


please let me know whether the alternative visualization of edits vs. discussions in Q3.1 works for you;
Q4.2 is now fixed: log10() used instead of the natural logarithm; data points are labeled in order to preserve the absolute values; the y-axis simply must be scaled because of the great disproportion between the number of edits (i.e. revisions on main namespaces) and the number of discussions (i.e. revisions on talk pages).

>> I would like to create a little poster based on these, so I try to make it graspable for "everyone".


If you want to print that poster someday, let me know so that I can produce 300dpi printable graphics from R for you.

Your feedback, please. Thank you!TASK DETAILhttps://phabricator.wikimedia.org/T206214EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovicCc: Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T206214: Basic data on Wikidata use: Edit counts, frequency

2018-10-22 Thread Jan_Dittrich
Jan_Dittrich added a comment.
2.1 Q2.x are like 1.x, but only for september 2018.

Yes. Initially, you asked for an overview of the last 30 days or so. However, the wmf.mediawiki_history Hadoop table, where the data for this analysis is found, …

That’s fine, I only wanted to be sure to check if edit patterns did dramatically change in the last years, the actual month and how many days exactly… is thus not that crucial as long as it is recent-is any long enough to smooth possible extremes, so all is fine.

Q3…

As you have already observed it did cause a bit of a headache to me too. Please allow me some time to figure out the most informative approach to this visualization and as soon as I have it you will have it too. Thank you.

Great.

The axes are off: x represents Month-Year, y represents log(number of Revisions). I will correct this as soon as I am done with the tricky Q3 diagrams.

Thanks! I think it is useful, but I get the scaling issues… could you use a  10-based  (instead of e-based) log scale for this one? I think it would be easier to understand, and we could label the axis 1-10-100… (so the "actual" counts).

Last but not least and  for some context:

I would like to create a little poster based on these, so I try to make it graspable for "everyone". I think we are on a good way there. I think this explains, why I would go with natural frequencies and labels as often as possible; same with using a linear scaling if possible and if not a 10 instead of e-based one.TASK DETAILhttps://phabricator.wikimedia.org/T206214EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovic, Jan_DittrichCc: Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T206214: Basic data on Wikidata use: Edit counts, frequency

2018-10-21 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.
@Jan_Dittrich Here we go:

– 1.1. "Q1.1 Checking for power-law behavior" has two log scaled axis, if I read it correctly. I do not get what the numerical labels on the y axis mean – is this number of users, but after log transformation, so 1 users becomes 9.21… on the log scale? The log is a natural logarithm, correct? (2.718281828459…)

Correct. Both the number of revisions (x-axis) and the number of users who made the respective number of revisions (y-axis) where log-transformed (it's a natural log indeed, base = e), and the plotted against each other.

– 1.2. Q1.2 and Q1.3 show the same data as Q1.1, but in natural numbers without transformations,

Yes. Except that Q1.3 is misnomed: it should be titled Q1.3 Histogram: Distribution of the the number of users across revisions. But yes, both Q1.2 and Q1.3 sections deal with the same data as Q1.1, except that they operate on the natural (count) scale, not on log-transforms.

– 1.3. The diagrams tell "We have many accounts that edited a few times and a small amount of accounts which edit(ed) a lot"

Exactly.

– 1.4. To clarify, this is how many revisions a specific account has created until the data was acquired? (I call "revisions" "edits", it seems then)

That is true. Q1.3 is the distribution of the number of users who have made a particular number of revisions (edits, if you prefer), up to the moment of data acquisition for this analysis.

– 2.1 Q2.x are like 1.x, but only for september 2018.

Yes. Initially, you asked for an overview of the last 30 days or so. However, the wmf.mediawiki_history Hadoop table, where the data for this analysis is found, is partitioned per snapshots, where a snapshot represents all the data up to the particular month (e.g. everything since the beginning of history up to August 2018, September 2018, October 2018, etc). At the time when I've run the analysis, the September snapshot was the freshest one available. So you don't see any of the October 2018 data here, for example. We will not be able to do October 2018 before November 2018; it takes some time for that table to update.

– 2.2 The diagrams tell "recent edit counts by user follow the same pattern (log) as the general edit count distribution"

Q2.2 does not use any log scaling (as well as the corresponding Q1.2), but yes, it tells us that in qualitative terms the distribution of the number of users who have made a particular number of edits is no different from what we've seen when we've analyzed all of the data in Q1 sections. One could also compare Q2.3 to Q1.3 and bring about the same judgment by eye-balling only. I am not very enthusiastic about going into strict statistical, model-based comparisons between the two distributions here, but if you really want to make sure whether the case is true as stated here... let me know.

– 3. Q3 The diagrams are indeed tricky to read. I would read the crosstabular one like a scatterplot, however, in this case, it would need jitter, would it? Maybe a "heatmap" like approach might also be OK: A 2-D-bin would be darker if more edit/discussion counts fall into it.

As you have already observed it did cause a bit of a headache to me too. Please allow me some time to figure out the most informative approach to this visualization and as soon as I have it you will have it too. Thank you.

– 4. Q4.2 is again a diagram I am unable to read well. Could it be that the labels are off? X seems to show dates, but y seems to show dates (year-month), too, but log scaled? So I could not make sense of it.

Oh, oh, oh: sorry. The axes are off: x represents Month-Year, y represents log(number of Revisions). I will correct this as soon as I am done with the tricky Q3 diagrams.

Please let me know if you have any additional questions. Thank you for you feedback!TASK DETAILhttps://phabricator.wikimedia.org/T206214EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovicCc: Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T206214: Basic data on Wikidata use: Edit counts, frequency

2018-10-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.
@Jan_Dittrich Thank you for your comments. I will provide all necessary explanations later in the evening.TASK DETAILhttps://phabricator.wikimedia.org/T206214EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovicCc: Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T206214: Basic data on Wikidata use: Edit counts, frequency

2018-10-19 Thread Jan_Dittrich
Jan_Dittrich added a comment.
@GoranSMilovanovic thanks!

Some questions for understanding it right:

– 1 
– 1.1. "Q1.1 Checking for power-law behavior" has two log scaled axis, if I read it correctly. I do not get what the numerical labels on the y axis mean  – is this number of users, but after log transformation, so 1 users becomes 9.21… on the log scale? The log is a natural logarithm, correct? (2.718281828459…)
F26633276: image.png
– 1.2. Q1.2 and Q1.3 show the same data as Q1.1, but in natural numbers without transformations, 
F26633280: image.png
– 1.3. The diagrams tell "We have many accounts that edited a few times and a small amount of accounts which edit(ed) a lot"
– 1.4. To clarify, this is how many revisions a specific account has created until the data was acquired? (I call "revisions" "edits", it seems then)
– 2.1 Q2.x are like 1.x, but only for september 2018. 
– 2.2 The diagrams tell "recent edit counts by user follow the same pattern (log) as the general edit count distribution"
– 3. Q3 The diagrams are indeed tricky to read. I would read the crosstabular one like a scatterplot, however, in this case, it would need jitter, would it? Maybe a "heatmap" like approach might also be OK: A 2-D-bin would be darker if more edit/discussion counts fall into it. 
–  4. Q4.2 is again a diagram I am unable to read well. Could it be that the labels are off? X seems to show dates, but y seems to show dates (year-month), too, but log scaled? So I could not make sense of it.TASK DETAILhttps://phabricator.wikimedia.org/T206214EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovic, Jan_DittrichCc: Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Aklapper, Jan_Dittrich, Nandana, Lahi, Gq86, QZanden, LawExplorer, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs