Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/7705#issuecomment-125693137
  
    I believe the issue @hhbyyh is raising is as follows:
    * To compute log-perplexity, we need to compute the probability of each 
word in the test document(s).
    * To compute that probability, we need to do inference.
    * Should inference be done (a) the same way as during learning (e.g., use 
variational inference if the model was produced by Online VB LDA), or (b) the 
same way for a given type of model (e.g., use inference type X for a 
LocalLDAModel, regardless of whether it was produced by online or EM), or (c) 
the same way for all models?
    
    My vote would be for (b) or (c) simply because those both mean implementing 
only 1 type of inference for now.  In the future, we could add the option to 
use different inference methods.
    
    One caveat: (a) seems like the "right" way to do things statistically: 
using the same procedure during testing and training.  But I don't think we 
have time to do it that way.  We could mark the method as DeveloperApi if we 
want to switch to (a) in the future.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to