Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/2388#issuecomment-55617366
  
    简要的说明:
    
    * 图的结构
      
        顶点为词(the source vertex),文档(the target 
vertex).边为文档中的词对应的主题(ID数组)
    
    *  训练过程
    
        1.  初始化 根据文档对应的词稀疏向量构建 
`RDD[Edge[ED]]`. 边的属性(数组形式储存)初始化为均匀分布.
    
        2.  æ 
¹æ®è¾¹çš„属性(主题数组)构建顶点属性(文档或词主题计数,稀疏向量形式存储),
 语料库主题计数(向量形式存储)
    
        3.  æ 
¹æ®é¡¶ç‚¹å±žæ€§(文档和词主题计数,语料库主题计数)做Gibbs采æ 
·,用采样结果作为边的属性.
    
        4.  循环第二步和第三步适当次数.
    
        5.  
使用顶点(词)的属性(词主题计数)和语料库主题计数初始化`TopicModel`类
    
    *   推断过程
    
        1. 文档主题分布初始化均匀分布
    
        2. 
用`TopicModel`ç±»(词主题计数,语料库主题计数和文档主题计数)做Gibbs采æ
 ·,得的新的文档主题分布
    
        3. 
循环第二步totalIter次,取后`burnInIter`次的平均结果为输出



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to