Joseph K. Bradley created SPARK-5564: ----------------------------------------
Summary: Support sparse LDA solutions Key: SPARK-5564 URL: https://issues.apache.org/jira/browse/SPARK-5564 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Latent Dirichlet Allocation (LDA) currently requires that the priors’ concentration parameters be > 1.0. It should support values > 0.0, which should encourage sparser topics (phi) and document-topic distributions (theta). For EM, this will require adding a projection to the M-step, as in: Vorontsov and Potapenko. "Tutorial on Probabilistic Topic Modeling : Additive Regularization for Stochastic Matrix Factorization." 2014. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org