[GitHub] [spark] zhengruifeng edited a comment on issue #26735: [SPARK-30102][ML][PYSPARK] GMM supports instance weighting

2019-12-23 Thread GitBox
zhengruifeng edited a comment on issue #26735: [SPARK-30102][ML][PYSPARK] GMM 
supports instance weighting
URL: https://github.com/apache/spark/pull/26735#issuecomment-568451629
 
 
   @huaxingao I found that the difference comes from the method to compute var 
`sumWeights`.
   In master, it keeps the weights of each clusters, and get the sum by 
`sums.weights.sum`,
   while in previous commit, I use a seperate var to keep the sum. The 
difference is quite tiny, but it unfortunately cause a sudden divergence at 
iter-7 (although the two impl finally convergen to the same result).
   I revert the computation of `sumWeights` and now the doctest works fine.
   
   @srowen I think this PR should be OK to merge. However, as disscussed above, 
we can see that the convergence of GMM is not stable, the loglikelihood may 
even drop sharply during the training procedure (which should not happen in 
theory). I think we need to make GMM more numerical stable in the future, but 
as to supporting instance weighting, I think current PR is OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng edited a comment on issue #26735: [SPARK-30102][ML][PYSPARK] GMM supports instance weighting

2019-12-02 Thread GitBox
zhengruifeng edited a comment on issue #26735: [SPARK-30102][ML][PYSPARK] GMM 
supports instance weighting
URL: https://github.com/apache/spark/pull/26735#issuecomment-560981773
 
 
   There seems something wrong in the py doctests.
   1, I manually test some scala cases/examples between 2.4.4 and this PR, the 
results are expected.
   2, I manually test the py doctest in 2.4.4 and the result is different from 
current expected value:
   
![image](https://user-images.githubusercontent.com/7322292/70017954-8e62d500-15bf-11ea-8dd0-81ca1ac98c51.png)
   3, I manually test the py doctest in this PR and the result the same as 
2.4.4:
   
![image](https://user-images.githubusercontent.com/7322292/70018006-b2beb180-15bf-11ea-9cfc-329021b53c71.png)
   
   I think I need to look into this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org