[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

WeichenXu123 Mon, 18 Sep 2017 23:38:38 -0700

Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/19229
  
    Oh. That's what have done in the old PR #18902 .(Because the RDD version 
(not in master branch, only personal impl here (sorry for put wrong link, the 
code link is here: 
https://github.com/apache/spark/pull/18902/commits/8daffc9007c65f04e005ffe5dcfbeca634480465)
 will be faster than dataframe version based on current spark. Now your PR has 
some improvement on the perf, I would like to compare them again. We hope to 
track this performance gap and try to resolve it in the future. According to my 
other similar case, now the dataframe version will be about 2-3x slower than 
RDD version in the case numCols==100 for now. But if you have no time, I can 
help do it. Thanks!



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

Reply via email to