[GitHub] [spark] zhengruifeng commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-06 Thread GitBox
zhengruifeng commented on pull request #28458: URL: https://github.com/apache/spark/pull/28458#issuecomment-624987233 Merged to master This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] zhengruifeng commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-06 Thread GitBox
zhengruifeng commented on pull request #28458: URL: https://github.com/apache/spark/pull/28458#issuecomment-624985569 This PR is a update of https://github.com/apache/spark/pull/27374, it can avoid performance regression on sparse datasets by default (with blockSize=1). On dense

[GitHub] [spark] zhengruifeng commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-06 Thread GitBox
zhengruifeng commented on pull request #28458: URL: https://github.com/apache/spark/pull/28458#issuecomment-624487752 retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] zhengruifeng commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-06 Thread GitBox
zhengruifeng commented on pull request #28458: URL: https://github.com/apache/spark/pull/28458#issuecomment-624470597 retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] zhengruifeng commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox
zhengruifeng commented on pull request #28458: URL: https://github.com/apache/spark/pull/28458#issuecomment-624427337 performace test on the first 10,000 instances of `webspam_wc_normalized_trigram` code: ```scala val df = spark.read.option("numFeatures",

[GitHub] [spark] zhengruifeng commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox
zhengruifeng commented on pull request #28458: URL: https://github.com/apache/spark/pull/28458#issuecomment-624426340 performace test on [`epsilon_normalized.t`](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html) code: ```scala import