GoranSMilovanovic claimed this task.
GoranSMilovanovic added a project: User-GoranSMilovanovic.
TASK DETAIL
https://phabricator.wikimedia.org/T255949
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Aklapper, GoranSMilovanovic
GoranSMilovanovic added a comment.
@Lydia_Pintscher as agreed in our 1:1 today:
- criterion: do not consider property-value pairs that were not reviewed by
at least 5 editors;
- crtierion: 95% of acceptance rate, meaning that everything up to 19
decisions must have a consensus.
TASK
GoranSMilovanovic added a comment.
Preliminary results based on T253552#6172533
<https://phabricator.wikimedia.org/T253552#6172533> @Ladsgroup datasets:
Per datatype:
datatypeaccepted rejected ratio
entity-type 419 119 3.52
text 3
GoranSMilovanovic added a comment.
@Ladsgroup @Addshore Do you need any help around this thing?
TASK DETAIL
https://phabricator.wikimedia.org/T154601
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Lucas_Werkmeister_WMDE
GoranSMilovanovic added a comment.
@Lydia_Pintscher @Addshore Given the status reset - T119976#6178863
<https://phabricator.wikimedia.org/T119976#6178863> - of this task, what do we
say: go, no go, priority?
TASK DETAIL
https://phabricator.wikimedia.org/T119976
EMAIL PREFERENCES
GoranSMilovanovic added a comment.
@Ladsgroup Thanks for the datasets.
@darthmon_wmde Thanks for the follow up.
@ItamarWMDE Nice to meet you too :) My LDAP is GoranSMilovanovic and I was
able to login to Toolforge from it (+2FA,) just a minute ago.
TASK DETAIL
https
GoranSMilovanovic added a comment.
Please someone ping me when we have the data for this and let me know where
do the data live.
TASK DETAIL
https://phabricator.wikimedia.org/T253552
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To
GoranSMilovanovic claimed this task.
GoranSMilovanovic added projects: WMDE-FUN-Sprint-2020-04-27,
User-GoranSMilovanovic.
TASK DETAIL
https://phabricator.wikimedia.org/T253552
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc
GoranSMilovanovic closed this task as "Resolved".
GoranSMilovanovic added a comment.
@WMDE-leszek Res, non verba.
TASK DETAIL
https://phabricator.wikimedia.org/T248308
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilo
GoranSMilovanovic added a comment.
@WMDE-leszek @darthmon_wmde Do we need anything else here in the foreseeable
future?
TASK DETAIL
https://phabricator.wikimedia.org/T248308
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc
GoranSMilovanovic added a subscriber: Samantha_Alipio_WMDE.
GoranSMilovanovic added a comment.
@WMDE-leszek @darthmon_wmde @Lydia_Pintscher @Addshore @Gehel
@Samantha_Alipio_WMDE
This could be useful for tomorrow's discussion on repeated queries:
F31802788: queries_Clustered
GoranSMilovanovic added a comment.
@WMDE-leszek Please, what is the status of this task?
TASK DETAIL
https://phabricator.wikimedia.org/T240466
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Aklapper, Addshore, Jan_Dittrich
GoranSMilovanovic added a comment.
Update `Tue 28 Apr 2020 02:17:33 AM UTC`
Here goes the update report on SPARQL feature selection via XGBoost:
F31783672: WDQS Endpoint Analytics_20200427_B.nb.html
<https://phabricator.wikimedia.org/F31783672>
- The model performan
GoranSMilovanovic added a comment.
Update `Mon 27 Apr 2020 10:31:05 PM UTC`:
**The most frequently observed SPARQL queries dataset**
- Selection criteria: the query was observed >= 50 times in the WDQS endpoint
sample (approx. `1M` queries, `2020/04/01` - `2020/04/21`).
- For e
GoranSMilovanovic added a comment.
Update `Mon 27 Apr 2020 10:10:23 PM UTC`:
**Final reports**
- Here goes the **Part A** of the Final Report which encompasses the
Exploratory Data Analysis (EDA) only, encompassing: (1) the characteristics of
the sample of SPARQL queries used in
GoranSMilovanovic added a comment.
@Addshore
- `queries_vocabulary.csv` - all features extracted from approx. `1M` SPARQL
queries, 1 - 21. April 2020; statistic: total feature frequency (including
multiple occurrences of the same feature in a query);
- `queries_coverage.csv` - all
GoranSMilovanovic added a comment.
Update `Fri 24 Apr 2020 04:01:17 AM CEST` and in respect to T248308#6062005
<https://phabricator.wikimedia.org/T248308#6062005>:
- A new sample of approximately `1M` SPARQL queries was drawn from the new
events schema
<https://gerrit.wikime
GoranSMilovanovic added a comment.
Update `Fri 24 Apr 2020 04:01:17 AM CEST` and in respect to T248308#6062005
<https://phabricator.wikimedia.org/T248308#6062005>:
- A new sample of approximately `1M` SPARQL queries was drawn from the new
events schema
<https://gerrit.wikime
GoranSMilovanovic added a comment.
@Gehel First of all, thank you for all the insights that you have brought
into the discussion thus far.
> There is probably better / more useful information published as part of the
new events
<https://gerrit.wikimedia.org/r/plugins/gitiles/med
GoranSMilovanovic added a comment.
Update `Thu Apr 16 10:21:32 UTC 2020`:
- following the meeting with thephp.cc yesterday:
- The modelling approach will change from more predictive to more
explanatory, i.e. the variables that could not be used for prediction
(`cache_status`, for
GoranSMilovanovic added a comment.
Update `wed, 15. apr 2020. 09:56:39 CEST`
- First report on modelling results, to be discussed in a meeting `10:00
CEST` today.
F31757331: WDQS Endpoint Analytics_20200414.nb.html
<https://phabricator.wikimedia.org/F31757331>
TASK DETAIL
GoranSMilovanovic added a comment.
Update `Thu 09 Apr 2020 10:19:24 PM UTC`:
- XGBoost w. `gbtree` on a binary classification problem ("typical" vs.
"extreme outlier" server response times) cross-validation started on
**stat1005**;
- using 9 data sets with varyin
GoranSMilovanovic added a comment.
- Update `Mon 06 Apr 2020 04:54:47 PM CEST`: modeling extreme outliers on
server response time (based on `time_firstbyte` from `wmf.webrequest`): **95%**
accuracy on both train and held out test data set.
- Note: consider re-formulating the problem as a
GoranSMilovanovic added a comment.
Current status:
- pilot/research experiments completed:
- research phase:
- model server response times from the features extracted as atomic
elements of the SPARQL queries in the sample;
- experimented with various feature selections (size
GoranSMilovanovic added a comment.
Current status:
- parsed SPARQL; initial, approximate feature engineering phase completed;
- NEXT: validating the features by optimizing server response time
(`time_firstbyte` from `wmf.webrequest`) by
- XGBoost with cross-validation;
- goal
GoranSMilovanovic added a subscriber: Jakob_WMDE.
GoranSMilovanovic added a comment.
- Meeting with @Jakob_WMDE today, very helpful:
- learned about some JS libraries that parse SPARQL
- this might help me a lot to improve the current feature engineering.
Current status
GoranSMilovanovic added a comment.
`Fri 27 Mar 2020 11:16:16 AM UTC`
- incorrect HiveQL sampling fixed (the first sample encompassed only queries
from the first hour of each day from `2020-03-01` to `2020-03-20`);
- the new sample, encompassing approximately 1% of all queries from
GoranSMilovanovic added a comment.
`Thu 26 Mar 2020 11:35:59 PM UTC`
- a sample of SPARQL queries from `wmf.webrequest` was obtained by randomly
sampling 1% of all queries that were sent out to WDQS on each day from
`2020-03-01` to `2020-03-20`;
- the sample is now cleaned by removing
GoranSMilovanovic added a project: User-GoranSMilovanovic.
GoranSMilovanovic added subscribers: WMDE-leszek, Lydia_Pintscher, Addshore.
GoranSMilovanovic added a comment.
Specification:
- fetch a random sample of SPARQL queries from `/sparql` and
`/bigdata/namespace/wdq/sparql` paths in
GoranSMilovanovic closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T242631
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: WMDE-leszek, Aklapper, Jan_Dittrich, darthmon_wmde, Nandana,
GoranSMilovanovic added a comment.
@Jan_Dittrich Great! Would like to have the ETL procedure put on a crontab
and run a regular monthly update, or shall we say just ask me when you need the
data again?
TASK DETAIL
https://phabricator.wikimedia.org/T242631
EMAIL PREFERENCES
https
GoranSMilovanovic added a comment.
@Jan_Dittrich @WMDE-leszek
- results with anonymized user_ids shared with @Jan_Dittrich via e-mail (cc:
@WMDE-leszek);
- awaiting feedback;
- no public results before we ask for a public data set review from the
#analytics <ht
GoranSMilovanovic added a comment.
@Lydia_Pintscher Is any additional work needed here or the ticket can be
resolved?
TASK DETAIL
https://phabricator.wikimedia.org/T234161
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc
GoranSMilovanovic added a comment.
@Jan_Dittrich
The only thing that I do not understand here is the following planned column:
> (pseudonymous) users
Do you need (1) a split between anonymous vs. non-anonymous editors in this
column, or (2) a column where each particular edi
GoranSMilovanovic added a project: User-GoranSMilovanovic.
TASK DETAIL
https://phabricator.wikimedia.org/T242631
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: WMDE-leszek, Aklapper, Jan_Dittrich, darthmon_wmde, Nandana, Lahi
GoranSMilovanovic added a comment.
@Addshore @Jan_Dittrich Here is the summary of the approach to collect the
baseline data, following our today's meeting:
**Step 1. Filter out revisions where the value of the statement is changed**
- we will use the `wmf.mediawiki_history` tab
GoranSMilovanovic added a comment.
@Addshore Well, now it sounds even more complicated than in the ticket
description.
I am for a call on this too. Let me just provide a few observations in
relation to what has been said and suggested until now.
> I do not think we want to use
GoranSMilovanovic added a comment.
@Lydia_Pintscher Finally, a "non-Commons" data quality report is ready:
Wikidata Quality Report - Commons Excluded
<http://wmdeanalytics.wmflabs.org/Wikidata%20Quality%20Report_nonCommons.nb.html>.
This one encompasses only items that are
GoranSMilovanovic added a comment.
@Lydia_Pintscher We now have a separate WD Quality Report for Wikimedia
Commons
<http://wmdeanalytics.wmflabs.org/Wikidata%20Quality%20Report_Commons.nb.html>.
Working on its complimentary, "non-Commons", quality assessment now.
TAS
GoranSMilovanovic added a comment.
My initial observations - **please comment:**
From the wmf.mediawiki_history
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history>
table (Data Lake, Hadoop):
select page_title, event_comment from wmf.mediawiki_h
GoranSMilovanovic added a comment.
My initial observations - **please comment**:
- The wb_changes <https://www.mediawiki.org/wiki/Wikibase/Schema/wb_changes>
schema might be what we need?
- This schema is poorly documented (see the doc
<https://www.mediawiki.org/wiki/Wikiba
GoranSMilovanovic claimed this task.
GoranSMilovanovic added projects: WMDE-Analytics-Engineering,
User-GoranSMilovanovic.
TASK DETAIL
https://phabricator.wikimedia.org/T240466
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc
GoranSMilovanovic added a comment.
@JAllemandou Thank you - as ever!
TASK DETAIL
https://phabricator.wikimedia.org/T209655
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Groceryheist, MGerlach, WMDE-leszek, abian, leila
GoranSMilovanovic added a comment.
@JAllemandou Do you think it would be possible to produce a new version of
this data set?
The latest update seems to be: `2019-10-03 09:29
/user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20190902` - which you have
pointed me at in T209655#5543575
GoranSMilovanovic added a comment.
@Lydia_Pintscher I guess this task is completed now.
However, we might need a new ticket in relation to this:
- to re-factor most of the data engineering code to work in the analytics
cluster
- (it is now done in R on a single server by a process
GoranSMilovanovic closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T221965
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Lea_Lacroix_WMDE, RazShuty, Lydia_Pintscher, GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved".
GoranSMilovanovic added a comment.
- Dashboard <http://wmdeanalytics.wmflabs.org/WD_LanguagesLandscape/> online.
TASK DETAIL
https://phabricator.wikimedia.org/T223119
EMAIL PREFERENCES
https://phabricator.wikimedia.or
GoranSMilovanovic closed subtask T223119: WD Languages Landscape: statistics +
dashboards as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T221965
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Lea_La
GoranSMilovanovic closed this task as "Resolved".
GoranSMilovanovic added a subscriber: Lea_Lacroix_WMDE.
GoranSMilovanovic added a comment.
Hi @Envlh @Lea_Lacroix_WMDE and thank you for pointing out this to me.
The Wikidata External Identifiers
<http://wmdeanalytic
GoranSMilovanovic added a comment.
@Lydia_Pintscher
You can take a look at our WikidataCon2019 shared doc
<https://docs.google.com/document/d/1oYra3Kao9DzeJBLp_X0usbH7m4S8FMyBY2gajVa31ds/edit>
and see if you can make use of anything from the **Wikidata Languages
Landscape: Statisti
GoranSMilovanovic renamed this task from "WD Languages Landscape: fundamental
statistics" to "WD Languages Landscape: statistics + dashboards".
GoranSMilovanovic added a subscriber: WMDE-leszek.
GoranSMilovanovic updated the task description.
TASK DETAIL
https://phabr
GoranSMilovanovic closed subtask T223117: WD Languages Landscape: essential
properties and classes as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T221965
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilo
GoranSMilovanovic closed subtask T223118: WD Languages Landscape: fundamental
data sets as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T221965
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Lea_La
GoranSMilovanovic closed this task as "Resolved".
GoranSMilovanovic added a comment.
Data model: **solved**.
TASK DETAIL
https://phabricator.wikimedia.org/T223117
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc
GoranSMilovanovic added a comment.
- UNESCO and Ethnologue Language Status: **solved**.
- Number of speakers: **solved**.
TASK DETAIL
https://phabricator.wikimedia.org/T223118
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc
GoranSMilovanovic closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T223118
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic, dar
GoranSMilovanovic added a comment.
- Script variants: **solved**.
TASK DETAIL
https://phabricator.wikimedia.org/T223118
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic
GoranSMilovanovic added a comment.
@JAllemandou Awesome, thank you!
TASK DETAIL
https://phabricator.wikimedia.org/T209655
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: WMDE-leszek, abian, leila, Ottomata, Nuria
GoranSMilovanovic added a comment.
@JAllemandou Would it be possible to have another update (beyond the most
recent `20190603`) of the dump in hdfs?
I would like to present some of the analytical systems based on this in the
WikidataCon 2019, and would be **very, very grateful** if a new
GoranSMilovanovic created this task.
GoranSMilovanovic added projects: Wikidata, WMDE-Analytics-Engineering,
User-GoranSMilovanovic.
Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION
- Compare Wikidata data quality in respect to Commons vs. everything else
WDCM usage stats
GoranSMilovanovic added a comment.
@Lydia_Pintscher A slightly adjusted version of the report:
F30478577: Wikidata Quality Report.nb.html
<https://phabricator.wikimedia.org/F30478577>
- no qualitative differences in the results/conclusions;
- addition: taking care to elimina
GoranSMilovanovic added a comment.
@Lydia_Pintscher Here is the final version of the Report, including the
timeline of the latest revids made for A, B, C, D, and E class items:
F30435077: Wikidata Quality Report.nb.html
<https://phabricator.wikimedia.org/F30435077>
Please
GoranSMilovanovic added a comment.
Here is a new version of the report with the Grading Scheme
<https://www.wikidata.org/wiki/Wikidata:Item_quality#Grading_scheme> for
Wikidata items included:
F30434504: Wikidata Quality Report.nb.html
<https://phabricator.wikimedia.org/
GoranSMilovanovic added a comment.
@Lydia_Pintscher @RazShuty @WMDE-leszek
Here's a prototype of a Wikidata Quality Report.
F30430120: Wikidata Quality Report.nb.html
<https://phabricator.wikimedia.org/F30430120>
NEXT STEPS:
- Include a bit more info on ORES in
GoranSMilovanovic added a subscriber: WMDE-leszek.
GoranSMilovanovic added a comment.
- Analytics/visualizations - **DONE.**
@Lydia_Pintscher @RazShuty @WMDE-leszek Here's a glimpse of what we've found
out thus far:
1. For all Wikidata items that have received an OR
GoranSMilovanovic added a comment.
Status:
- working on analytics/visualizations now;
- next steps: dashboard.
TASK DETAIL
https://phabricator.wikimedia.org/T195702
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc
GoranSMilovanovic removed a project: User-GoranSMilovanovic.
TASK DETAIL
https://phabricator.wikimedia.org/T196193
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Aklapper, GoranSMilovanovic, Lydia_Pintscher, abian
GoranSMilovanovic added a comment.
@Halfak Thank you, Aaron.
TASK DETAIL
https://phabricator.wikimedia.org/T195702
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: darthmon_wmde, Ladsgroup, elal, Halfak, RazShuty, hoo
GoranSMilovanovic added a subscriber: darthmon_wmde.
GoranSMilovanovic added a comment.
@Ladsgroup @RazShuty @darthmon_wmde
Amir, right before our meeting, what we need here is simple:
- take a look at the sample data set in T195702#5208632
<https://phabricator.wikimedia.org/T195
GoranSMilovanovic added a comment.
@Lydia_Pintscher @RazShuty
Something to begin with:
- each node is a language (Wikimedia language codes
<https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all> are
used);
- each language points towards the three most s
GoranSMilovanovic added a comment.
- the Jaccard similarity and distance matrices: testing, the procedure is
memory efficient but slow (subsetting the dgCMatrix class matrix...):
- **DONE.** We can have the Jaccard distances here too.
TASK DETAIL
https://phabricator.wikimedia.org/T223118
GoranSMilovanovic added a comment.
- Batch processing over sparse matrices (`dgCMatrix` class) is now employed
to compute
- the co-occurence data set: **success**, using approx. order of magnitude
less resources than the previously employed procedure, and
- the Jaccard similarity and
GoranSMilovanovic added a comment.
@Lea_WMDE Hm, this might be the solution - `dygraph` se to `dylegend(show =
'follow')`, please check:
http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/
**Note.** This was the initial solution and there is one thing I don't like
ab
GoranSMilovanovic added a comment.
@Lea_WMDE
Strange.
Lea, please let me know what browser are you using. I have tested the
dashboard on Chromium and Mozilla Firefox under Ubuntu; the WDCM system, using
the same front-end technology (RStudio Shiny) was tested over an even broader
GoranSMilovanovic added a comment.
- given how often is `stat1007` used by analysts,
- it barely has the resources for the computations that we need here (the
languages x languages contingency table; takes at least ~25Gb to compute);
- a fail-safe, batch processing procedure to compute
GoranSMilovanovic added a comment.
@Lea_WMDE
> Could you add the info why wikistats2 data differs from these graphs to the
explanatory text?
Done: dashboard <http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/>.
> Apart from that I cannot see the total averag
GoranSMilovanovic added a comment.
@Lea_WMDE The Total Average is back: dashboard
<http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/>.
TASK DETAIL
https://phabricator.wikimedia.org/T208567
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreference
GoranSMilovanovic added a comment.
@Milimetric Thanks for the clarification, Dan.
@Lea_WMDE This implies that
- the difference between the numbers reported on our Pageviews per namespace
Dashboard <http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/> and
Wikistats2
GoranSMilovanovic added a subscriber: Milimetric.
GoranSMilovanovic added a comment.
@Lea_WMDE Ok, here is a direct test (Pyspark code against the
wmf.pageviews_hourly
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly>
table):
pw = sqlConte
GoranSMilovanovic added a comment.
@Lea_WMDE So that is one order of magnitude and looks straightforward
impossible to happen. Please let me check.
I guess the difference of this magnitude could not be a consequence of the
fact that we have picked only four namespaces (Entity, Property
GoranSMilovanovic added a comment.
@Lea_WMDE Take a look, please:
http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/
TASK DETAIL
https://phabricator.wikimedia.org/T208567
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
GoranSMilovanovic added a comment.
@Lea_WMDE I am on it.
TASK DETAIL
https://phabricator.wikimedia.org/T208567
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: GoranSMilovanovic, Aklapper, WMDE-leszek, Lea_WMDE, darthmon_wmde
GoranSMilovanovic added a comment.
@Lea_WMDE
- On the vertical axes the dashboards now uses `K`, `M`, and `B` for
thousands, millions, and billions of pageviews, respectively.
TASK DETAIL
https://phabricator.wikimedia.org/T208567
EMAIL PREFERENCES
https://phabricator.wikimedia.org
GoranSMilovanovic added a comment.
@Lea_WMDE
- The dashboard is now running a regular daily update;
- fixing the axis labels now.
TASK DETAIL
https://phabricator.wikimedia.org/T208567
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To
GoranSMilovanovic added a comment.
@Lea_WMDE I am on it, putting the dashboard on regular updates + fixing the
labels to include decimal points.
TASK DETAIL
https://phabricator.wikimedia.org/T208567
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To
GoranSMilovanovic added a subscriber: Halfak.
GoranSMilovanovic added a comment.
@Lydia_Pintscher
> That's "our" information. And then we have links/external identifiers to
say 3 libraries that also have information about X. We want to somehow quantify
the latter for
GoranSMilovanovic added a comment.
@Lydia_Pintscher
> We should try to find ways to quantify this information.
Would you allow me to become creative in that respect and try to figure out
what statistics we could offer publicly?
> We should track it over time and publis
GoranSMilovanovic added a comment.
@Lea_WMDE You can now test your new dashboard:
http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/
- Next steps:
- introduce client-side dependency, and
- put on a regular daily update as soon as
- T227905 <ht
GoranSMilovanovic added a comment.
- data set review requested from #analytics
<https://phabricator.wikimedia.org/tag/analytics/> in T227905
<https://phabricator.wikimedia.org/T227905>;
- next steps:
- visualizations + dashboard;
- test, deploy.
TASK DE
GoranSMilovanovic added a subtask: T227905: Public Data Review Needed.
TASK DETAIL
https://phabricator.wikimedia.org/T208567
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: GoranSMilovanovic, Aklapper, WMDE-leszek, Lea_WMDE
GoranSMilovanovic added a comment.
- data set production - completed.
- Next steps:
- orchestrate Pyspark from an R environment where post-processing will take
place;
- prepare data for visualizations and export to published data sets;
- visualizations + dashboard;
- test
GoranSMilovanovic added a comment.
@Lea_WMDE I guess `640` is the `EntitySchema` namespace (figured this out
from this Gerrit patch
<https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseSchema/+/506471/2/extension.json>,
since it is not documented in the Wikidata namespaces
GoranSMilovanovic claimed this task.
TASK DETAIL
https://phabricator.wikimedia.org/T208567
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: GoranSMilovanovic, Aklapper, WMDE-leszek, Lea_WMDE, darthmon_wmde, Nandana,
Lahi, Gq86
GoranSMilovanovic added a comment.
@Lea_WMDE Do we have any additional requirements here or shall we resolve the
ticket?
TASK DETAIL
https://phabricator.wikimedia.org/T220977
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc
GoranSMilovanovic added a comment.
@JAllemandou Thanks for the recent `20190603` dump copy in HDFS.
TASK DETAIL
https://phabricator.wikimedia.org/T209655
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: abian, leila, Ottomata
GoranSMilovanovic added a comment.
@Lydia_Pintscher @RazShuty @Halfak
Ok, here's what I've got:
item revision timestamp usage
1 Q36524 924799644 2019-04-26 06:29:25.0 6791020
2 Q54919 929383859 2019-04-30 21:14:14.0 4376000
3 Q423048 9191
GoranSMilovanovic added a comment.
@JAllemandou Thanks for feedback!
@Lea_WMDE Given the current situation with the geo-localized edits (see
T220977#5186818 <https://phabricator.wikimedia.org/T220977#5186818>), do you
want me to proceed with the per continent analysis for pagevie
GoranSMilovanovic added a comment.
@Lea_WMDE Here we go:
- the following chart shows mobile edits vs. mobile pageviews separately for
users and spiders;
- what we can learn from this chart is that **the growth is certainly
natural**, given that the spiders have made a minimal number
GoranSMilovanovic claimed this task.
GoranSMilovanovic added projects: WMDE-Analytics-Engineering,
User-GoranSMilovanovic.
TASK DETAIL
https://phabricator.wikimedia.org/T195702
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc
GoranSMilovanovic added a comment.
@Lea_WMDE Yes, we do have a more or less steady increase in mobile edits on
Wikidata:
F29055561: MobileEdits_2019.png <https://phabricator.wikimedia.org/F29055561>
TASK DETAIL
https://phabricator.wikimedia.org/T220977
EMAIL PREFERENCES
501 - 600 of 828 matches
Mail list logo