Re: LDA spark ML visualization

2016-09-13 Thread janardhan shetty
Any help is appreciated to proceed in this problem.
On Sep 12, 2016 11:45 AM, "janardhan shetty"  wrote:

> Hi,
>
> I am trying to visualize the LDA model developed in spark scala (2.0 ML)
> in LDAvis.
>
> Is there any links to convert the spark model parameters to the following
> 5 params to visualize ?
>
> 1. φ, the K × W matrix containing the estimated probability mass function
> over the W terms in the vocabulary for each of the K topics in the model.
> Note that φkw > 0 for all k ∈ 1...K and all w ∈ 1...W, because of the
> priors. (Although our software allows values of zero due to rounding). Each
> of the K rows of φ must sum to one.
> 2. θ, the D × K matrix containing the estimated probability mass function
> over the K topics in the model for each of the D documents in the corpus.
> Note that θdk > 0 for all d ∈ 1...D and all k ∈ 1...K, because of the
> priors (although, as above, our software accepts zeroes due to rounding).
> Each of the D rows of θ must sum to one.
> 3. nd, the number of tokens observed in document d, where nd is required
> to be an integer greater than zero, for documents d = 1...D. Denoted
> doc.length in our code.
> 4. vocab, the length-W character vector containing the terms in the
> vocabulary (listed in the same order as the columns of φ).
> 5. Mw, the frequency of term w across the entire corpus, where Mw is
> required to be an integer greater than zero for each term w = 1...W.
> Denoted term.frequency in our code.
>


LDA spark ML visualization

2016-09-12 Thread janardhan shetty
Hi,

I am trying to visualize the LDA model developed in spark scala (2.0 ML) in
LDAvis.

Is there any links to convert the spark model parameters to the following 5
params to visualize ?

1. φ, the K × W matrix containing the estimated probability mass function
over the W terms in the vocabulary for each of the K topics in the model.
Note that φkw > 0 for all k ∈ 1...K and all w ∈ 1...W, because of the
priors. (Although our software allows values of zero due to rounding). Each
of the K rows of φ must sum to one.
2. θ, the D × K matrix containing the estimated probability mass function
over the K topics in the model for each of the D documents in the corpus.
Note that θdk > 0 for all d ∈ 1...D and all k ∈ 1...K, because of the
priors (although, as above, our software accepts zeroes due to rounding).
Each of the D rows of θ must sum to one.
3. nd, the number of tokens observed in document d, where nd is required to
be an integer greater than zero, for documents d = 1...D. Denoted
doc.length in our code.
4. vocab, the length-W character vector containing the terms in the
vocabulary (listed in the same order as the columns of φ).
5. Mw, the frequency of term w across the entire corpus, where Mw is
required to be an integer greater than zero for each term w = 1...W.
Denoted term.frequency in our code.