zhengruifeng edited a comment on issue #26735: [SPARK-30102][ML][PYSPARK] GMM
supports instance weighting
URL: https://github.com/apache/spark/pull/26735#issuecomment-568451629
@huaxingao I found that the difference comes from the method to compute var
`sumWeights`.
In master, it keeps the weights of each clusters, and get the sum by
`sums.weights.sum`,
while in previous commit, I use a seperate var to keep the sum. The
difference is quite tiny, but it unfortunately cause a sudden divergence at
iter-7 (although the two impl finally convergen to the same result).
I revert the computation of `sumWeights` and now the doctest works fine.
@srowen I think this PR should be OK to merge. However, as disscussed above,
we can see that the convergence of GMM is not stable, the loglikelihood may
even drop sharply during the training procedure (which should not happen in
theory). I think we need to make GMM more numerical stable in the future, but
as to supporting instance weighting, I think current PR is OK.
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
With regards,
Apache Git Services
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org