huaxingao edited a comment on issue #26735: [SPARK-30102][WIP][ML][PYSPARK] GMM supports instance weighting URL: https://github.com/apache/spark/pull/26735#issuecomment-563413620 I guess instead of changing maxIter=5 and compare the logLikelihood at iteration 5, maybe use a much bigger maxIter so it will converge. Compare the logLikelihood at convergence. It puzzled me why the logLikelihoods from iteration 7 are so different from the logLikelihoods computed using the original code. Weight is not set in the python doctest and it uses default 1.0. So in theory, this should behave exact the same as the original code, the logLikelihood at each iteration should be very similar as the logLikelihood computed using the original code, right? I tried both the original code and the code with changes, they start to have different logLikelihood at iteration 7, but both of them converge at iteration 25, with the same logLikelihood 65.02945125241477. I agree that we probably need to change the current convergence check. Seems to me that we also need to compare the logLikelihood difference to the previous difference. The difference should be smaller and smaller and eventually converge. However, I tested with the current code, the logLikelihood differences are not getting smaller consistently. | iteration | logLikelihoodPrev | logLikelihood |diff | | -------- | ----------------- |-------------- | -- | | 15 | 36.402816949681664 | 36.55682231506764 | 0.1540053653859772| | 16 | 36.55682231506764 | 36.75888971475007 | 0.20206739968242715| | 17 | 36.75888971475007 | 37.581643170088086 | 0.8227534553380167| | 18 | 37.581643170088086 | 6.674670202869423 | 30.906972967218664| | 19 | 6.674670202869423 | 10.601046748584544 | 3.9263765457151205| | 20 | 10.601046748584544 | 39.71941181091317 | 29.11836506232863| | 21 | 39.71941181091317 | 49.2147989416624 | 9.49538713074923| | 22 | 49.2147989416624 | 76.11383657713708 | 26.899037635474677| | 23 | 76.11383657713708 | 71.28238165058754 | 4.83145492654954| | 24 | 71.28238165058754 | 65.02945125241477 | 6.252930398172765| | 25 | 65.02945125241477 | 65.02945125241477 | 0.0|
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
