[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-10-13 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - UNESCO and Ethnologue Language Status: **solved**. - Number of speakers: **solved**. TASK DETAIL https://phabricator.wikimedia.org/T223118 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic

[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-10-03 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - Script variants: **solved**. TASK DETAIL https://phabricator.wikimedia.org/T223118 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic,

[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-08-23 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - the Jaccard similarity and distance matrices: testing, the procedure is memory efficient but slow (subsetting the dgCMatrix class matrix...): - **DONE.** We can have the Jaccard distances here too. TASK DETAIL https://phabricator.wikimedia.org/T223118

[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - Batch processing over sparse matrices (`dgCMatrix` class) is now employed to compute - the co-occurence data set: **success**, using approx. order of magnitude less resources than the previously employed procedure, and - the Jaccard similarity and

[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-08-07 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - given how often is `stat1007` used by analysts, - it barely has the resources for the computations that we need here (the languages x languages contingency table; takes at least ~25Gb to compute); - a fail-safe, batch processing procedure to compute