[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-25 Thread GitBox


LuciferYang commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698100080







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-25 Thread GitBox


LuciferYang commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698905421


   Address f2a26c5 fix RandomForestRegressorSuite.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-25 Thread GitBox


LuciferYang commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698899448


   ok ~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-25 Thread GitBox


LuciferYang commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698819722


   Synchronize the test result:
   
   
https://github.com/apache/spark/blob/0bc0e91e4015eb98bd2f4bf17da2ec7135b520a9/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala#L194-L202
   
   - **Test A:** Change to use `LinkedHashMap` to store topNodesForGroup and 
use `nodesForGroup.keys.toSeq.sorted` to init it was no essential impact.
   
   - **Test B:** Change to use `LinkedHashMap` to store `topNodesForGroup` and 
`nodesForGroup` seems that the goal of unified behavior in Scala 2.12 and 2.13 
can be achieved,  but need change the tol in the last test to 0.77(the result 
is 0.771)
   
   - **Test C:** Only change the tol in the last test to 0.75 can workaround
   
   Which do you recommend better @srowen , B or C?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-24 Thread GitBox


LuciferYang commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698679002


   > f you like you can try making the ordering of the Map in this code 
deterministic to see if that does it:
   
   val topNodesForGroup: Map[Int, LearningNode] = 
nodesForGroup.keys.map(treeIdx => treeIdx -> topNodes(treeIdx)).toMap 
   But, I don't even know if the result the test is complaining about is 
'wrong'.
   
   OK ~ I will try this first and feedback later.
   
   But change 0.78 to 0.75 seems simpler, haha :)
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


LuciferYang commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698123928


   @dongjoon-hyun Address 4f5eac5 rebase master



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


LuciferYang commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698112073


   
https://github.com/apache/spark/blob/0bc0e91e4015eb98bd2f4bf17da2ec7135b520a9/mllib/src/test/scala/org/apache/spark/ml/regression/RandomForestRegressorSuite.scala#L36-L54
   
   And I found if we change `nPoints` from 1000 to 1150, the test will pass



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


LuciferYang commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698111573


   cc @srowen The remaining failed case is 
   
   ```
   RandomForestRegressorSuite:
   - training with sample weights *** FAILED ***
 0.756 was not greater than or equal to 0.78 (MLTestingUtils.scala:285)
   ```
   
   
https://github.com/apache/spark/blob/0bc0e91e4015eb98bd2f4bf17da2ec7135b520a9/mllib/src/test/scala/org/apache/spark/ml/regression/RandomForestRegressorSuite.scala#L171-L200
   
   Input `(50, 10, 0.95, 0.78)` with 
   
   ```
   MLTestingUtils.testOversamplingVsWeighting[RandomForestRegressionModel,
   RandomForestRegressor](df.as[LabeledPoint], estimator,
   MLTestingUtils.modelPredictionEquals(df, _ ~= _ relTol 0.2, tol), 
seed)
   ```
   failed.
   
   I found that the following `RandomForest.runBagged` behave differently for 
the same input in Scala 2.12 and Scala 2.13, maybe related to the follow code 
block:
   
   
https://github.com/apache/spark/blob/0bc0e91e4015eb98bd2f4bf17da2ec7135b520a9/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala#L191-L215
   
   but I am not familiar with this algorithm and I not find root cause, I think 
we need an expert to guide how to fix it
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


LuciferYang commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698100080


   cc @dongjoon-hyun https://github.com/apache/spark/pull/29861 fix GitHub 2.13 
build Action, related to k8s module, I will rebase this pr after it merged.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org