Hi Cui, Try to read the scala version of LDAExample, https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
The matrix you're referring to is the corpus after vectorization. One example, given a dict, [apple, orange, banana] 3 documents: Apple orange Orange banana Apple banana Can be represented by dense vectors: 1, 1, 0 0, 1, 1 1, 0, 1 Cheers, Yuhao -----Original Message----- From: Cui xp [mailto:lifeiniao...@gmail.com] Sent: Wednesday, May 6, 2015 4:28 PM To: user@spark.apache.org Subject: The explanation of input text format using LDA in Spark Hi all, After I read the example code using LDA in Spark, I found the input text in the code is a matrix. the format of the text is as follows: 1 2 6 0 2 3 1 1 0 0 3 1 3 0 1 3 0 0 2 0 0 1 1 4 1 0 0 4 9 0 1 2 0 2 1 0 3 0 0 5 0 2 3 9 3 1 1 9 3 0 2 0 0 1 3 4 2 0 3 4 5 1 1 1 4 0 2 1 0 3 0 0 5 0 2 2 9 1 1 1 9 2 1 2 0 0 1 3 4 4 0 3 4 2 1 3 0 0 0 2 8 2 0 3 0 2 0 2 7 2 1 1 1 9 0 2 2 0 0 3 3 4 1 0 0 4 5 1 3 0 1 0 But I don't know the explanation of each line or each column. And if I have several text documents, how do I process them to use LDA in Spark? Thanks. Cui xp -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-explanation-of-input-text-format-using-LDA-in-Spark-tp22781.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org