[Wikidata-bugs] [Maniphest] [Claimed] T255949: Provide usage statistics on Wikibase APIs

2020-06-22 Thread GoranSMilovanovic
GoranSMilovanovic claimed this task. GoranSMilovanovic added a project: User-GoranSMilovanovic. TASK DETAIL https://phabricator.wikimedia.org/T255949 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Aklapper, GoranSMilovanovic

[Wikidata-bugs] [Maniphest] [Commented On] T253552: Detailed Reports from game DB

2020-06-16 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher as agreed in our 1:1 today: - criterion: do not consider property-value pairs that were not reviewed by at least 5 editors; - crtierion: 95% of acceptance rate, meaning that everything up to 19 decisions must have a consensus. TASK

[Wikidata-bugs] [Maniphest] [Commented On] T253552: Detailed Reports from game DB

2020-06-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Preliminary results based on T253552#6172533 <https://phabricator.wikimedia.org/T253552#6172533> @Ladsgroup datasets: Per datatype: datatypeaccepted rejected ratio entity-type 419 119 3.52 text 3

[Wikidata-bugs] [Maniphest] [Commented On] T154601: Grafana: "wikidata-datamodel-terms" doesn't update anymore

2020-06-11 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Ladsgroup @Addshore Do you need any help around this thing? TASK DETAIL https://phabricator.wikimedia.org/T154601 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Lucas_Werkmeister_WMDE

[Wikidata-bugs] [Maniphest] [Commented On] T119976: Track number of stubs on top 20 wikipedias

2020-05-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher @Addshore Given the status reset - T119976#6178863 <https://phabricator.wikimedia.org/T119976#6178863> - of this task, what do we say: go, no go, priority? TASK DETAIL https://phabricator.wikimedia.org/T119976 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T253552: Detailed Reports from game DB

2020-05-28 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Ladsgroup Thanks for the datasets. @darthmon_wmde Thanks for the follow up. @ItamarWMDE Nice to meet you too :) My LDAP is GoranSMilovanovic and I was able to login to Toolforge from it (+2FA,) just a minute ago. TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T253552: Detailed Reports from game DB

2020-05-28 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Please someone ping me when we have the data for this and let me know where do the data live. TASK DETAIL https://phabricator.wikimedia.org/T253552 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Claimed] T253552: Detailed Reports from game DB

2020-05-28 Thread GoranSMilovanovic
GoranSMilovanovic claimed this task. GoranSMilovanovic added projects: WMDE-FUN-Sprint-2020-04-27, User-GoranSMilovanovic. TASK DETAIL https://phabricator.wikimedia.org/T253552 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc

[Wikidata-bugs] [Maniphest] [Closed] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-05-19 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". GoranSMilovanovic added a comment. @WMDE-leszek Res, non verba. TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilo

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-05-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @WMDE-leszek @darthmon_wmde Do we need anything else here in the foreseeable future? TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-05-04 Thread GoranSMilovanovic
GoranSMilovanovic added a subscriber: Samantha_Alipio_WMDE. GoranSMilovanovic added a comment. @WMDE-leszek @darthmon_wmde @Lydia_Pintscher @Addshore @Gehel @Samantha_Alipio_WMDE This could be useful for tomorrow's discussion on repeated queries: F31802788: queries_Clustered_3000

[Wikidata-bugs] [Maniphest] [Commented On] T240466: Measure the impact of Tainted References Wikidata feature

2020-05-03 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @WMDE-leszek Please, what is the status of this task? TASK DETAIL https://phabricator.wikimedia.org/T240466 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Aklapper, Addshore, Jan_Dittrich

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-27 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Update `Tue 28 Apr 2020 02:17:33 AM UTC` Here goes the update report on SPARQL feature selection via XGBoost: F31783672: WDQS Endpoint Analytics_20200427_B.nb.html <https://phabricator.wikimedia.org/F31783672> - The model perfo

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-27 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Update `Mon 27 Apr 2020 10:31:05 PM UTC`: **The most frequently observed SPARQL queries dataset** - Selection criteria: the query was observed >= 50 times in the WDQS endpoint sample (approx. `1M` queries, `2020/04/01` - `2020/04/21`). - For e

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-27 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Update `Mon 27 Apr 2020 10:10:23 PM UTC`: **Final reports** - Here goes the **Part A** of the Final Report which encompasses the Exploratory Data Analysis (EDA) only, encompassing: (1) the characteristics of the sample of SPARQL queries used

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-24 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Addshore - `queries_vocabulary.csv` - all features extracted from approx. `1M` SPARQL queries, 1 - 21. April 2020; statistic: total feature frequency (including multiple occurrences of the same feature in a query); - `queries_coverage.csv` - all

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-24 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Update `Fri 24 Apr 2020 04:01:17 AM CEST` and in respect to T248308#6062005 <https://phabricator.wikimedia.org/T248308#6062005>: - A new sample of approximately `1M` SPARQL queries was drawn from the new events schema <https://gerrit.wikime

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-23 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Update `Fri 24 Apr 2020 04:01:17 AM CEST` and in respect to T248308#6062005 <https://phabricator.wikimedia.org/T248308#6062005>: - A new sample of approximately `1M` SPARQL queries was drawn from the new events schema <https://gerrit.wikime

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-20 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Gehel First of all, thank you for all the insights that you have brought into the discussion thus far. > There is probably better / more useful information published as part of the new events <https://gerrit.wikimedia.org/r/plugins/gitiles/med

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-16 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Update `Thu Apr 16 10:21:32 UTC 2020`: - following the meeting with thephp.cc yesterday: - The modelling approach will change from more predictive to more explanatory, i.e. the variables that could not be used for prediction (`cache_status

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Update `wed, 15. apr 2020. 09:56:39 CEST` - First report on modelling results, to be discussed in a meeting `10:00 CEST` today. F31757331: WDQS Endpoint Analytics_20200414.nb.html <https://phabricator.wikimedia.org/F31757331> TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-09 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Update `Thu 09 Apr 2020 10:19:24 PM UTC`: - XGBoost w. `gbtree` on a binary classification problem ("typical" vs. "extreme outlier" server response times) cross-validation started on **stat1005**; - using 9 data sets with varyin

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-06 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - Update `Mon 06 Apr 2020 04:54:47 PM CEST`: modeling extreme outliers on server response time (based on `time_firstbyte` from `wmf.webrequest`): **95%** accuracy on both train and held out test data set. - Note: consider re-formulating the problem

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-04-06 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Current status: - pilot/research experiments completed: - research phase: - model server response times from the features extracted as atomic elements of the SPARQL queries in the sample; - experimented with various feature selections (size

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-03-31 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Current status: - parsed SPARQL; initial, approximate feature engineering phase completed; - NEXT: validating the features by optimizing server response time (`time_firstbyte` from `wmf.webrequest`) by - XGBoost with cross-validation; - goal

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-03-31 Thread GoranSMilovanovic
GoranSMilovanovic added a subscriber: Jakob_WMDE. GoranSMilovanovic added a comment. - Meeting with @Jakob_WMDE today, very helpful: - learned about some JS libraries that parse SPARQL - this might help me a lot to improve the current feature engineering. Current status

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-03-27 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. `Fri 27 Mar 2020 11:16:16 AM UTC` - incorrect HiveQL sampling fixed (the first sample encompassed only queries from the first hour of each day from `2020-03-01` to `2020-03-20`); - the new sample, encompassing approximately 1% of all queries from

[Wikidata-bugs] [Maniphest] [Commented On] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-03-26 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. `Thu 26 Mar 2020 11:35:59 PM UTC` - a sample of SPARQL queries from `wmf.webrequest` was obtained by randomly sampling 1% of all queries that were sent out to WDQS on each day from `2020-03-01` to `2020-03-20`; - the sample is now cleaned by removing

[Wikidata-bugs] [Maniphest] [Updated] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-03-23 Thread GoranSMilovanovic
GoranSMilovanovic added a project: User-GoranSMilovanovic. GoranSMilovanovic added subscribers: WMDE-leszek, Lydia_Pintscher, Addshore. GoranSMilovanovic added a comment. Specification: - fetch a random sample of SPARQL queries from `/sparql` and `/bigdata/namespace/wdq/sparql` paths

[Wikidata-bugs] [Maniphest] [Closed] T242631: Get user/editcount data to determine count at percentiles

2020-01-21 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T242631 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: WMDE-leszek, Aklapper, Jan_Dittrich, darthmon_wmde, Nandana,

[Wikidata-bugs] [Maniphest] [Commented On] T242631: Get user/editcount data to determine count at percentiles

2020-01-21 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Jan_Dittrich Great! Would like to have the ETL procedure put on a crontab and run a regular monthly update, or shall we say just ask me when you need the data again? TASK DETAIL https://phabricator.wikimedia.org/T242631 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T242631: Get user/editcount data to determine count at percentiles

2020-01-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Jan_Dittrich @WMDE-leszek - results with anonymized user_ids shared with @Jan_Dittrich via e-mail (cc: @WMDE-leszek); - awaiting feedback; - no public results before we ask for a public data set review from the #analytics <ht

[Wikidata-bugs] [Maniphest] [Commented On] T234161: WD Data Quality: compare quality vs usage on commons vs everything else

2020-01-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher Is any additional work needed here or the ticket can be resolved? TASK DETAIL https://phabricator.wikimedia.org/T234161 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc

[Wikidata-bugs] [Maniphest] [Commented On] T242631: Get user/editcount data to determine count at percentiles

2020-01-14 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Jan_Dittrich The only thing that I do not understand here is the following planned column: > (pseudonymous) users Do you need (1) a split between anonymous vs. non-anonymous editors in this column, or (2) a column where each particular edi

[Wikidata-bugs] [Maniphest] [Updated] T242631: Get user/editcount data to determine count at percentiles

2020-01-14 Thread GoranSMilovanovic
GoranSMilovanovic added a project: User-GoranSMilovanovic. TASK DETAIL https://phabricator.wikimedia.org/T242631 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: WMDE-leszek, Aklapper, Jan_Dittrich, darthmon_wmde, Nandana, Lahi

[Wikidata-bugs] [Maniphest] [Commented On] T240466: Measure the impact of Tainted References Wikidata feature

2019-12-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Addshore @Jan_Dittrich Here is the summary of the approach to collect the baseline data, following our today's meeting: **Step 1. Filter out revisions where the value of the statement is changed** - we will use the `wmf.mediawiki_history` table

[Wikidata-bugs] [Maniphest] [Commented On] T240466: Measure the impact of Tainted References Wikidata feature

2019-12-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Addshore Well, now it sounds even more complicated than in the ticket description. I am for a call on this too. Let me just provide a few observations in relation to what has been said and suggested until now. > I do not think we want to

[Wikidata-bugs] [Maniphest] [Commented On] T234161: WD Data Quality: compare quality vs usage on commons vs everything else

2019-12-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher Finally, a "non-Commons" data quality report is ready: Wikidata Quality Report - Commons Excluded <http://wmdeanalytics.wmflabs.org/Wikidata%20Quality%20Report_nonCommons.nb.html>. This one encompasses only items that are

[Wikidata-bugs] [Maniphest] [Commented On] T234161: WD Data Quality: compare quality vs usage on commons vs everything else

2019-12-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher We now have a separate WD Quality Report for Wikimedia Commons <http://wmdeanalytics.wmflabs.org/Wikidata%20Quality%20Report_Commons.nb.html>. Working on its complimentary, "non-Commons", quality assessment now. TAS

[Wikidata-bugs] [Maniphest] [Commented On] T240466: Measure the impact of Tainted References Wikidata feature

2019-12-13 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. My initial observations - **please comment:** From the wmf.mediawiki_history <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history> table (Data Lake, Hadoop): select page_title, event_comment from wmf.mediawiki_h

[Wikidata-bugs] [Maniphest] [Updated] T240466: Measure the impact of Tainted References Wikidata feature

2019-12-13 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. My initial observations - **please comment**: - The wb_changes <https://www.mediawiki.org/wiki/Wikibase/Schema/wb_changes> schema might be what we need? - This schema is poorly documented (see the doc <https://www.mediawiki.org/wiki/Wikiba

[Wikidata-bugs] [Maniphest] [Claimed] T240466: Measure the impact of Tainted References Wikidata feature

2019-12-12 Thread GoranSMilovanovic
GoranSMilovanovic claimed this task. GoranSMilovanovic added projects: WMDE-Analytics-Engineering, User-GoranSMilovanovic. TASK DETAIL https://phabricator.wikimedia.org/T240466 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc

[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFS

2019-12-05 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @JAllemandou Thank you - as ever! TASK DETAIL https://phabricator.wikimedia.org/T209655 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Groceryheist, MGerlach, WMDE-leszek, abian, leila

[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFS

2019-12-03 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @JAllemandou Do you think it would be possible to produce a new version of this data set? The latest update seems to be: `2019-10-03 09:29 /user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20190902` - which you have pointed me at in T209655#5543575

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-10-28 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher I guess this task is completed now. However, we might need a new ticket in relation to this: - to re-factor most of the data engineering code to work in the analytics cluster - (it is now done in R on a single server by a process

[Wikidata-bugs] [Maniphest] [Closed] T221965: Wikidata Languages Landscape

2019-10-24 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T221965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Lea_Lacroix_WMDE, RazShuty, Lydia_Pintscher, GoranSMilovanovic

[Wikidata-bugs] [Maniphest] [Closed] T223119: WD Languages Landscape: statistics + dashboards

2019-10-22 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". GoranSMilovanovic added a comment. - Dashboard <http://wmdeanalytics.wmflabs.org/WD_LanguagesLandscape/> online. TASK DETAIL https://phabricator.wikimedia.org/T223119 EMAIL PREFERENCES https://phabricator.wikimedia.or

[Wikidata-bugs] [Maniphest] [Unblock] T221965: Wikidata Languages Landscape

2019-10-22 Thread GoranSMilovanovic
GoranSMilovanovic closed subtask T223119: WD Languages Landscape: statistics + dashboards as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T221965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Lea_Lacroix_WMDE

[Wikidata-bugs] [Maniphest] [Closed] T235533: WDCM External Identifiers Dashboard is down

2019-10-16 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". GoranSMilovanovic added a subscriber: Lea_Lacroix_WMDE. GoranSMilovanovic added a comment. Hi @Envlh @Lea_Lacroix_WMDE and thank you for pointing out this to me. The Wikidata External Identifiers <http://wmdeanalytic

[Wikidata-bugs] [Maniphest] [Commented On] T223119: WD Languages Landscape: statistics + dashboards

2019-10-13 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher You can take a look at our WikidataCon2019 shared doc <https://docs.google.com/document/d/1oYra3Kao9DzeJBLp_X0usbH7m4S8FMyBY2gajVa31ds/edit> and see if you can make use of anything from the **Wikidata Languages Landscape: Stat

[Wikidata-bugs] [Maniphest] [Retitled] T223119: WD Languages Landscape: statistics + dashboards

2019-10-13 Thread GoranSMilovanovic
GoranSMilovanovic renamed this task from "WD Languages Landscape: fundamental statistics" to "WD Languages Landscape: statistics + dashboards". GoranSMilovanovic added a subscriber: WMDE-leszek. GoranSMilovanovic updated the task description. TASK DETAIL https://phabr

[Wikidata-bugs] [Maniphest] [Unblock] T221965: Wikidata Languages Landscape

2019-10-13 Thread GoranSMilovanovic
GoranSMilovanovic closed subtask T223117: WD Languages Landscape: essential properties and classes as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T221965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc

[Wikidata-bugs] [Maniphest] [Unblock] T221965: Wikidata Languages Landscape

2019-10-13 Thread GoranSMilovanovic
GoranSMilovanovic closed subtask T223118: WD Languages Landscape: fundamental data sets as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T221965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Lea_Lacroix_WMDE

[Wikidata-bugs] [Maniphest] [Closed] T223117: WD Languages Landscape: essential properties and classes

2019-10-13 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". GoranSMilovanovic added a comment. Data model: **solved**. TASK DETAIL https://phabricator.wikimedia.org/T223117 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc

[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-10-13 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - UNESCO and Ethnologue Language Status: **solved**. - Number of speakers: **solved**. TASK DETAIL https://phabricator.wikimedia.org/T223118 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc

[Wikidata-bugs] [Maniphest] [Closed] T223118: WD Languages Landscape: fundamental data sets

2019-10-13 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T223118 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic, dar

[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-10-03 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - Script variants: **solved**. TASK DETAIL https://phabricator.wikimedia.org/T223118 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic

[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFs

2019-10-03 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @JAllemandou Awesome, thank you! TASK DETAIL https://phabricator.wikimedia.org/T209655 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: WMDE-leszek, abian, leila, Ottomata, Nuria

[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFs

2019-09-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @JAllemandou Would it be possible to have another update (beyond the most recent `20190603`) of the dump in hdfs? I would like to present some of the analytical systems based on this in the WikidataCon 2019, and would be **very, very grateful** if a new

[Wikidata-bugs] [Maniphest] [Created] T234161: WD Data Quality: compare quality vs usage on commons vs everything else

2019-09-29 Thread GoranSMilovanovic
GoranSMilovanovic created this task. GoranSMilovanovic added projects: Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION - Compare Wikidata data quality in respect to Commons vs. everything else WDCM usage stats

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-26 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher A slightly adjusted version of the report: F30478577: Wikidata Quality Report.nb.html <https://phabricator.wikimedia.org/F30478577> - no qualitative differences in the results/conclusions; - addition: taking care to elimina

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-22 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher Here is the final version of the Report, including the timeline of the latest revids made for A, B, C, D, and E class items: F30435077: Wikidata Quality Report.nb.html <https://phabricator.wikimedia.org/F30435077> Please

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-22 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Here is a new version of the report with the Grading Scheme <https://www.wikidata.org/wiki/Wikidata:Item_quality#Grading_scheme> for Wikidata items included: F30434504: Wikidata Quality Report.nb.html <https://phabricator.wikimedia.org/

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-21 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher @RazShuty @WMDE-leszek Here's a prototype of a Wikidata Quality Report. F30430120: Wikidata Quality Report.nb.html <https://phabricator.wikimedia.org/F30430120> NEXT STEPS: - Include a bit more info on ORES in the

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-20 Thread GoranSMilovanovic
GoranSMilovanovic added a subscriber: WMDE-leszek. GoranSMilovanovic added a comment. - Analytics/visualizations - **DONE.** @Lydia_Pintscher @RazShuty @WMDE-leszek Here's a glimpse of what we've found out thus far: 1. For all Wikidata items that have received an ORES quality

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-17 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Status: - working on analytics/visualizations now; - next steps: dashboard. TASK DETAIL https://phabricator.wikimedia.org/T195702 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc

[Wikidata-bugs] [Maniphest] [Updated] T196193: Create KNIME nodes to interact with Wikidata

2019-09-17 Thread GoranSMilovanovic
GoranSMilovanovic removed a project: User-GoranSMilovanovic. TASK DETAIL https://phabricator.wikimedia.org/T196193 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Aklapper, GoranSMilovanovic, Lydia_Pintscher, abian

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-04 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Halfak Thank you, Aaron. TASK DETAIL https://phabricator.wikimedia.org/T195702 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: darthmon_wmde, Ladsgroup, elal, Halfak, RazShuty, hoo

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T195702: track quality of all/top 10000 Wikidata items over time

2019-08-27 Thread GoranSMilovanovic
GoranSMilovanovic added a subscriber: darthmon_wmde. GoranSMilovanovic added a comment. @Ladsgroup @RazShuty @darthmon_wmde Amir, right before our meeting, what we need here is simple: - take a look at the sample data set in T195702#5208632 <https://phabricator.wikimedia.org/T195

[Wikidata-bugs] [Maniphest] [Commented On] T223119: WD Languages Landscape: fundamental statistics

2019-08-23 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher @RazShuty Something to begin with: - each node is a language (Wikimedia language codes <https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all> are used); - each language points towards the three most s

[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-08-23 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - the Jaccard similarity and distance matrices: testing, the procedure is memory efficient but slow (subsetting the dgCMatrix class matrix...): - **DONE.** We can have the Jaccard distances here too. TASK DETAIL https://phabricator.wikimedia.org/T223118

[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - Batch processing over sparse matrices (`dgCMatrix` class) is now employed to compute - the co-occurence data set: **success**, using approx. order of magnitude less resources than the previously employed procedure, and - the Jaccard similarity

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-08-09 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE Hm, this might be the solution - `dygraph` se to `dylegend(show = 'follow')`, please check: http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/ **Note.** This was the initial solution and there is one thing I don't like about it in spite

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-08-08 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE Strange. Lea, please let me know what browser are you using. I have tested the dashboard on Chromium and Mozilla Firefox under Ubuntu; the WDCM system, using the same front-end technology (RStudio Shiny) was tested over an even broader

[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-08-07 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - given how often is `stat1007` used by analysts, - it barely has the resources for the computations that we need here (the languages x languages contingency table; takes at least ~25Gb to compute); - a fail-safe, batch processing procedure to compute

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-08-07 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE > Could you add the info why wikistats2 data differs from these graphs to the explanatory text? Done: dashboard <http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/>. > Apart from that I cannot see the total averag

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-08-04 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE The Total Average is back: dashboard <http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/>. TASK DETAIL https://phabricator.wikimedia.org/T208567 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailprefe

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-08-01 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Milimetric Thanks for the clarification, Dan. @Lea_WMDE This implies that - the difference between the numbers reported on our Pageviews per namespace Dashboard <http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/> and Wiki

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T208567: Count Wikidata page views per page type

2019-08-01 Thread GoranSMilovanovic
GoranSMilovanovic added a subscriber: Milimetric. GoranSMilovanovic added a comment. @Lea_WMDE Ok, here is a direct test (Pyspark code against the wmf.pageviews_hourly <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly> table): pw = sqlConte

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-08-01 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE So that is one order of magnitude and looks straightforward impossible to happen. Please let me check. I guess the difference of this magnitude could not be a consequence of the fact that we have picked only four namespaces (Entity, Property

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-07-31 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE Take a look, please: http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/ TASK DETAIL https://phabricator.wikimedia.org/T208567 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-07-29 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE I am on it. TASK DETAIL https://phabricator.wikimedia.org/T208567 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: GoranSMilovanovic, Aklapper, WMDE-leszek, Lea_WMDE, darthmon_wmde

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-07-24 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE - On the vertical axes the dashboards now uses `K`, `M`, and `B` for thousands, millions, and billions of pageviews, respectively. TASK DETAIL https://phabricator.wikimedia.org/T208567 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-07-24 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE - The dashboard is now running a regular daily update; - fixing the axis labels now. TASK DETAIL https://phabricator.wikimedia.org/T208567 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-07-24 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE I am on it, putting the dashboard on regular updates + fixing the labels to include decimal points. TASK DETAIL https://phabricator.wikimedia.org/T208567 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Updated] T227701: Quantify additional information available via external identifiers

2019-07-15 Thread GoranSMilovanovic
GoranSMilovanovic added a subscriber: Halfak. GoranSMilovanovic added a comment. @Lydia_Pintscher > That's "our" information. And then we have links/external identifiers to say 3 libraries that also have information about X. We want to somehow quantify the latter for all

[Wikidata-bugs] [Maniphest] [Updated] T227701: Quantify additional information available via external identifiers

2019-07-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher > We should try to find ways to quantify this information. Would you allow me to become creative in that respect and try to figure out what statistics we could offer publicly? > We should track it over time and p

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-07-14 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE You can now test your new dashboard: http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/ - Next steps: - introduce client-side dependency, and - put on a regular daily update as soon as - T227905 <ht

[Wikidata-bugs] [Maniphest] [Updated] T208567: Count Wikidata page views per page type

2019-07-12 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - data set review requested from #analytics <https://phabricator.wikimedia.org/tag/analytics/> in T227905 <https://phabricator.wikimedia.org/T227905>; - next steps: - visualizations + dashboard; - test, deploy. TASK DE

[Wikidata-bugs] [Maniphest] [Updated] T208567: Count Wikidata page views per page type

2019-07-12 Thread GoranSMilovanovic
GoranSMilovanovic added a subtask: T227905: Public Data Review Needed. TASK DETAIL https://phabricator.wikimedia.org/T208567 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: GoranSMilovanovic, Aklapper, WMDE-leszek, Lea_WMDE

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-07-08 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - data set production - completed. - Next steps: - orchestrate Pyspark from an R environment where post-processing will take place; - prepare data for visualizations and export to published data sets; - visualizations + dashboard; - test

[Wikidata-bugs] [Maniphest] [Commented On] T208567: Count Wikidata page views per page type

2019-07-08 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE I guess `640` is the `EntitySchema` namespace (figured this out from this Gerrit patch <https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseSchema/+/506471/2/extension.json>, since it is not documented in the Wikidata namespaces

[Wikidata-bugs] [Maniphest] [Claimed] T208567: Count Wikidata page views per page type

2019-07-03 Thread GoranSMilovanovic
GoranSMilovanovic claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T208567 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: GoranSMilovanovic, Aklapper, WMDE-leszek, Lea_WMDE, darthmon_wmde, Nandana, Lahi, Gq86

[Wikidata-bugs] [Maniphest] [Commented On] T220977: Investigate surprising rise in mobile page views for wikidata

2019-06-17 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE Do we have any additional requirements here or shall we resolve the ticket? TASK DETAIL https://phabricator.wikimedia.org/T220977 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc

[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFs

2019-06-07 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @JAllemandou Thanks for the recent `20190603` dump copy in HDFS. TASK DETAIL https://phabricator.wikimedia.org/T209655 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: abian, leila, Ottomata

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-05-23 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher @RazShuty @Halfak Ok, here's what I've got: item revision timestamp usage 1 Q36524 924799644 2019-04-26 06:29:25.0 6791020 2 Q54919 929383859 2019-04-30 21:14:14.0 4376000 3 Q423048 919180363 2019

[Wikidata-bugs] [Maniphest] [Commented On] T220977: Investigate surprising rise in mobile page views for wikidata

2019-05-16 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @JAllemandou Thanks for feedback! @Lea_WMDE Given the current situation with the geo-localized edits (see T220977#5186818 <https://phabricator.wikimedia.org/T220977#5186818>), do you want me to proceed with the per continent analysis for pagevie

[Wikidata-bugs] [Maniphest] [Commented On] T220977: Investigate surprising rise in mobile page views for wikidata

2019-05-16 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE Here we go: - the following chart shows mobile edits vs. mobile pageviews separately for users and spiders; - what we can learn from this chart is that **the growth is certainly natural**, given that the spiders have made a minimal number

[Wikidata-bugs] [Maniphest] [Claimed] T195702: track quality of all/top 10000 Wikidata items over time

2019-05-16 Thread GoranSMilovanovic
GoranSMilovanovic claimed this task. GoranSMilovanovic added projects: WMDE-Analytics-Engineering, User-GoranSMilovanovic. TASK DETAIL https://phabricator.wikimedia.org/T195702 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc

[Wikidata-bugs] [Maniphest] [Commented On] T220977: Investigate surprising rise in mobile page views for wikidata

2019-05-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lea_WMDE Yes, we do have a more or less steady increase in mobile edits on Wikidata: F29055561: MobileEdits_2019.png <https://phabricator.wikimedia.org/F29055561> TASK DETAIL https://phabricator.wikimedia.org/T220977 EMAIL PREFERENCES

<    1   2   3   4   5   6   7   8   9   >