[GitHub] [spark] zhengruifeng edited a comment on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-06 Thread GitBox
zhengruifeng edited a comment on pull request #28458: URL: https://github.com/apache/spark/pull/28458#issuecomment-624985569 This PR is a update of https://github.com/apache/spark/pull/27374, it can avoid performance regression on sparse datasets by default (with blockSize=1). On dense

[GitHub] [spark] zhengruifeng edited a comment on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox
zhengruifeng edited a comment on pull request #28458: URL: https://github.com/apache/spark/pull/28458#issuecomment-624427337 performace test on **sparse dataset**: the first 10,000 instances of `webspam_wc_normalized_trigram` code: ```scala val df =