[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
LuciferYang commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698100080 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
LuciferYang commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698905421 Address f2a26c5 fix RandomForestRegressorSuite. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
LuciferYang commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698899448 ok ~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
LuciferYang commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698819722 Synchronize the test result: https://github.com/apache/spark/blob/0bc0e91e4015eb98bd2f4bf17da2ec7135b520a9/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala#L194-L202 - **Test A:** Change to use `LinkedHashMap` to store topNodesForGroup and use `nodesForGroup.keys.toSeq.sorted` to init it was no essential impact. - **Test B:** Change to use `LinkedHashMap` to store `topNodesForGroup` and `nodesForGroup` seems that the goal of unified behavior in Scala 2.12 and 2.13 can be achieved, but need change the tol in the last test to 0.77(the result is 0.771) - **Test C:** Only change the tol in the last test to 0.75 can workaround Which do you recommend better @srowen , B or C? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
LuciferYang commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698679002 > f you like you can try making the ordering of the Map in this code deterministic to see if that does it: val topNodesForGroup: Map[Int, LearningNode] = nodesForGroup.keys.map(treeIdx => treeIdx -> topNodes(treeIdx)).toMap But, I don't even know if the result the test is complaining about is 'wrong'. OK ~ I will try this first and feedback later. But change 0.78 to 0.75 seems simpler, haha :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
LuciferYang commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698123928 @dongjoon-hyun Address 4f5eac5 rebase master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
LuciferYang commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698112073 https://github.com/apache/spark/blob/0bc0e91e4015eb98bd2f4bf17da2ec7135b520a9/mllib/src/test/scala/org/apache/spark/ml/regression/RandomForestRegressorSuite.scala#L36-L54 And I found if we change `nPoints` from 1000 to 1150, the test will pass This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
LuciferYang commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698111573 cc @srowen The remaining failed case is ``` RandomForestRegressorSuite: - training with sample weights *** FAILED *** 0.756 was not greater than or equal to 0.78 (MLTestingUtils.scala:285) ``` https://github.com/apache/spark/blob/0bc0e91e4015eb98bd2f4bf17da2ec7135b520a9/mllib/src/test/scala/org/apache/spark/ml/regression/RandomForestRegressorSuite.scala#L171-L200 Input `(50, 10, 0.95, 0.78)` with ``` MLTestingUtils.testOversamplingVsWeighting[RandomForestRegressionModel, RandomForestRegressor](df.as[LabeledPoint], estimator, MLTestingUtils.modelPredictionEquals(df, _ ~= _ relTol 0.2, tol), seed) ``` failed. I found that the following `RandomForest.runBagged` behave differently for the same input in Scala 2.12 and Scala 2.13, maybe related to the follow code block: https://github.com/apache/spark/blob/0bc0e91e4015eb98bd2f4bf17da2ec7135b520a9/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala#L191-L215 but I am not familiar with this algorithm and I not find root cause, I think we need an expert to guide how to fix it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
LuciferYang commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698100080 cc @dongjoon-hyun https://github.com/apache/spark/pull/29861 fix GitHub 2.13 build Action, related to k8s module, I will rebase this pr after it merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org