[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-10-13 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  - UNESCO and Ethnologue Language Status: **solved**.
  - Number of speakers: **solved**.

TASK DETAIL
  https://phabricator.wikimedia.org/T223118

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic, darthmon_wmde, 
DannyS712, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-10-03 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  - Script variants: **solved**.

TASK DETAIL
  https://phabricator.wikimedia.org/T223118

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic, darthmon_wmde, 
DannyS712, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-08-23 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  - the Jaccard similarity and distance matrices: testing, the procedure is 
memory efficient but slow (subsetting the dgCMatrix class matrix...):
  - **DONE.** We can have the Jaccard distances here too.

TASK DETAIL
  https://phabricator.wikimedia.org/T223118

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic, darthmon_wmde, 
DannyS712, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  - Batch processing over sparse matrices (`dgCMatrix` class) is now employed 
to compute
- the co-occurence data set: **success**, using approx. order of magnitude 
less resources than the previously employed procedure, and
- the Jaccard similarity and distance matrices: **testing**, the procedure 
is memory efficient but slow (subsetting the `dgCMatrix` class matrix...).

TASK DETAIL
  https://phabricator.wikimedia.org/T223118

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic, darthmon_wmde, 
DannyS712, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T223118: WD Languages Landscape: fundamental data sets

2019-08-07 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  - given how often is `stat1007` used by analysts,
  - it barely has the resources for the computations that we need here (the 
languages x languages contingency table; takes at least ~25Gb to compute);
  - a fail-safe, batch processing procedure to compute large contingency 
matrices in R will be developed;
  - it will rely on `base` and/or `data.table` R functions, but it will be
  - less demanding in terms of memory resources.

TASK DETAIL
  https://phabricator.wikimedia.org/T223118

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic, darthmon_wmde, 
DannyS712, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs