[jira] [Comment Edited] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291378#comment-17291378 ] zhengruifeng edited comment on SPARK-34448 at 2/26/21, 4:29 AM: --

[jira] [Commented] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291378#comment-17291378 ] zhengruifeng commented on SPARK-34448: -- [~srowen] [~weichenxu123]  [~ykerzhner] M

[jira] [Commented] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-24 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290642#comment-17290642 ] zhengruifeng commented on SPARK-34448: -- [~srowen] Thanks for pinging me, I am going

[jira] [Created] (SPARK-34470) VectorSlicer use ordering if possible

2021-02-18 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34470: Summary: VectorSlicer use ordering if possible Key: SPARK-34470 URL: https://issues.apache.org/jira/browse/SPARK-34470 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-34356) OVR transform fix potential column conflict

2021-02-04 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-34356: - Summary: OVR transform fix potential column conflict (was: OVR transform avoid potential column

[jira] [Assigned] (SPARK-34356) OVR transform avoid potential column conflict

2021-02-04 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34356: Assignee: zhengruifeng > OVR transform avoid potential column conflict >

[jira] [Created] (SPARK-34356) OVR transform avoid potential column conflict

2021-02-04 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34356: Summary: OVR transform avoid potential column conflict Key: SPARK-34356 URL: https://issues.apache.org/jira/browse/SPARK-34356 Project: Spark Issue Type: Imp

[jira] [Created] (SPARK-34353) CollectLimitExec avoid shuffle if input rdd has single partition

2021-02-03 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34353: Summary: CollectLimitExec avoid shuffle if input rdd has single partition Key: SPARK-34353 URL: https://issues.apache.org/jira/browse/SPARK-34353 Project: Spark

[jira] [Created] (SPARK-34307) TakeOrderedAndProjectExec avoid shuffle if input rdd has single partition

2021-01-31 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34307: Summary: TakeOrderedAndProjectExec avoid shuffle if input rdd has single partition Key: SPARK-34307 URL: https://issues.apache.org/jira/browse/SPARK-34307 Project: Sp

[jira] [Resolved] (SPARK-34256) VectorSlicer refine numFeatures checking and toString method

2021-01-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34256. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31354 [https://gi

[jira] [Assigned] (SPARK-34256) VectorSlicer refine numFeatures checking and toString method

2021-01-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34256: Assignee: zhengruifeng > VectorSlicer refine numFeatures checking and toString method > -

[jira] [Created] (SPARK-34291) LSH hashDistance optimization

2021-01-29 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34291: Summary: LSH hashDistance optimization Key: SPARK-34291 URL: https://issues.apache.org/jira/browse/SPARK-34291 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-34256) VectorSlicer refine numFeatures checking and toString method

2021-01-26 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34256: Summary: VectorSlicer refine numFeatures checking and toString method Key: SPARK-34256 URL: https://issues.apache.org/jira/browse/SPARK-34256 Project: Spark

[jira] [Resolved] (SPARK-34189) w2v findSynonyms optimization

2021-01-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34189. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31276 [https://gi

[jira] [Assigned] (SPARK-34189) w2v findSynonyms optimization

2021-01-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34189: Assignee: zhengruifeng (was: Apache Spark) > w2v findSynonyms optimization > ---

[jira] [Resolved] (SPARK-34220) BucketedRandomProjectionLSH transform opt

2021-01-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34220. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31313 [https://gi

[jira] [Assigned] (SPARK-34220) BucketedRandomProjectionLSH transform opt

2021-01-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34220: Assignee: zhengruifeng > BucketedRandomProjectionLSH transform opt >

[jira] [Created] (SPARK-34220) BucketedRandomProjectionLSH transform opt

2021-01-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34220: Summary: BucketedRandomProjectionLSH transform opt Key: SPARK-34220 URL: https://issues.apache.org/jira/browse/SPARK-34220 Project: Spark Issue Type: Improve

[jira] [Assigned] (SPARK-34189) w2v findSynonyms optimization

2021-01-21 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34189: Assignee: zhengruifeng > w2v findSynonyms optimization > - >

[jira] [Created] (SPARK-34189) w2v findSynonyms optimization

2021-01-21 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34189: Summary: w2v findSynonyms optimization Key: SPARK-34189 URL: https://issues.apache.org/jira/browse/SPARK-34189 Project: Spark Issue Type: Improvement

[jira] [Resolved] (SPARK-34047) save tree models in single partition

2021-01-20 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34047. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31090 [https://gi

[jira] [Assigned] (SPARK-34047) save tree models in single partition

2021-01-20 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34047: Assignee: zhengruifeng > save tree models in single partition > -

[jira] [Resolved] (SPARK-34106) Hide FValueTest and AnovaTest

2021-01-13 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34106. -- Resolution: Duplicate > Hide FValueTest and AnovaTest > - > >

[jira] [Created] (SPARK-34106) Hide FValueTest and AnovaTest

2021-01-13 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34106: Summary: Hide FValueTest and AnovaTest Key: SPARK-34106 URL: https://issues.apache.org/jira/browse/SPARK-34106 Project: Spark Issue Type: Sub-task

[jira] [Assigned] (SPARK-34106) Hide FValueTest and AnovaTest

2021-01-13 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34106: Assignee: zhengruifeng > Hide FValueTest and AnovaTest > - >

[jira] [Created] (SPARK-34093) param maxDepth should check upper bound

2021-01-12 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34093: Summary: param maxDepth should check upper bound Key: SPARK-34093 URL: https://issues.apache.org/jira/browse/SPARK-34093 Project: Spark Issue Type: Improveme

[jira] [Assigned] (SPARK-34045) OneVsRestModel.transform should not call setter of submodels

2021-01-11 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34045: Assignee: zhengruifeng > OneVsRestModel.transform should not call setter of submodels > -

[jira] [Resolved] (SPARK-34045) OneVsRestModel.transform should not call setter of submodels

2021-01-11 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34045. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31086 [https://gi

[jira] [Resolved] (SPARK-33773) expose intermediateStorageLevel in mllib

2021-01-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-33773. -- Resolution: Not A Problem > expose intermediateStorageLevel in mllib > ---

[jira] [Created] (SPARK-34047) save tree models in single partition

2021-01-08 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34047: Summary: save tree models in single partition Key: SPARK-34047 URL: https://issues.apache.org/jira/browse/SPARK-34047 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-34045) OneVsRestModel.transform should not call setter of submodels

2021-01-07 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-34045: - Description: featuresCol of submodels maybe changed in transform: {code:java} scala> val df =

[jira] [Created] (SPARK-34045) OneVsRestModel.transform should not call setter of submodels

2021-01-07 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34045: Summary: OneVsRestModel.transform should not call setter of submodels Key: SPARK-34045 URL: https://issues.apache.org/jira/browse/SPARK-34045 Project: Spark

[jira] [Comment Edited] (SPARK-33398) AnalysisException when loading a PipelineModel with Spark 3

2020-12-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253435#comment-17253435 ] zhengruifeng edited comment on SPARK-33398 at 12/22/20, 11:50 AM:

[jira] [Commented] (SPARK-33398) AnalysisException when loading a PipelineModel with Spark 3

2020-12-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253435#comment-17253435 ] zhengruifeng commented on SPARK-33398: -- [~nmarcott]  This issue also exists in RF/G

[jira] [Commented] (SPARK-33398) AnalysisException when loading a PipelineModel with Spark 3

2020-12-21 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253293#comment-17253293 ] zhengruifeng commented on SPARK-33398: -- [~nmarcott]  I can reproduce this failure,

[jira] [Resolved] (SPARK-31948) expose mapSideCombine in aggByKey/reduceByKey/foldByKey

2020-12-17 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-31948. -- Resolution: Not A Problem > expose mapSideCombine in aggByKey/reduceByKey/foldByKey >

[jira] [Resolved] (SPARK-31976) use MemoryUsage to control the size of block

2020-12-15 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-31976. -- Resolution: Resolved > use MemoryUsage to control the size of block >

[jira] [Resolved] (SPARK-31661) Document usage of blockSize

2020-12-15 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-31661. -- Resolution: Resolved > Document usage of blockSize > --- > >

[jira] [Created] (SPARK-33773) expose intermediateStorageLevel in mllib

2020-12-14 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-33773: Summary: expose intermediateStorageLevel in mllib Key: SPARK-33773 URL: https://issues.apache.org/jira/browse/SPARK-33773 Project: Spark Issue Type: Improvem

[jira] [Resolved] (SPARK-33609) word2vec reduce broadcast size

2020-12-07 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-33609. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30548 [https://gi

[jira] [Assigned] (SPARK-33609) word2vec reduce broadcast size

2020-12-07 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-33609: Assignee: zhengruifeng > word2vec reduce broadcast size > --

[jira] [Resolved] (SPARK-32320) Remove mutable default arguments

2020-12-07 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-32320. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 29122 [https://gi

[jira] [Assigned] (SPARK-32320) Remove mutable default arguments

2020-12-07 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-32320: Assignee: Fokko Driesprong > Remove mutable default arguments > -

[jira] [Resolved] (SPARK-33610) Imputer transform skip head() job

2020-12-02 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-33610. -- Resolution: Resolved > Imputer transform skip head() job > - >

[jira] [Created] (SPARK-33610) Imputer transform skip head() job

2020-11-30 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-33610: Summary: Imputer transform skip head() job Key: SPARK-33610 URL: https://issues.apache.org/jira/browse/SPARK-33610 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-33609) word2vec reduce broadcast size

2020-11-30 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-33609: Summary: word2vec reduce broadcast size Key: SPARK-33609 URL: https://issues.apache.org/jira/browse/SPARK-33609 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-33518) Improve performance of ML ALS recommendForAll by GEMV

2020-11-30 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-33518: - Description: There were a lot of works on improving ALS's {{recommendForAll}} For now, I found

[jira] [Updated] (SPARK-33518) Improve performance of ML ALS recommendForAll by GEMV

2020-11-30 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-33518: - Affects Version/s: (was: 3.2.0) 3.1.0 > Improve performance of ML ALS

[jira] [Created] (SPARK-33518) Improve performance of ML ALS recommendForAll by GEMV

2020-11-23 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-33518: Summary: Improve performance of ML ALS recommendForAll by GEMV Key: SPARK-33518 URL: https://issues.apache.org/jira/browse/SPARK-33518 Project: Spark Issue T

[jira] [Updated] (SPARK-33466) Imputer support mode(most_frequent) strategy

2020-11-17 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-33466: - Component/s: PySpark > Imputer support mode(most_frequent) strategy > --

[jira] [Created] (SPARK-33466) Imputer support mode(most_frequent) strategy

2020-11-17 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-33466: Summary: Imputer support mode(most_frequent) strategy Key: SPARK-33466 URL: https://issues.apache.org/jira/browse/SPARK-33466 Project: Spark Issue Type: New

[jira] [Assigned] (SPARK-32691) Update commons-crypto to v1.1.0

2020-11-09 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-32691: Assignee: huangtianhua (was: zhengruifeng) > Update commons-crypto to v1.1.0 > -

[jira] [Assigned] (SPARK-32691) Test org.apache.spark.DistributedSuite failed on arm64 jenkins

2020-10-16 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-32691: Assignee: zhengruifeng > Test org.apache.spark.DistributedSuite failed on arm64 jenkins >

[jira] [Commented] (SPARK-32691) Test org.apache.spark.DistributedSuite failed on arm64 jenkins

2020-10-15 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17214516#comment-17214516 ] zhengruifeng commented on SPARK-32691: -- [~huangtianhua] It looks like that this is

[jira] [Created] (SPARK-33111) aft transform optimization

2020-10-10 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-33111: Summary: aft transform optimization Key: SPARK-33111 URL: https://issues.apache.org/jira/browse/SPARK-33111 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-32907) adaptively blockify instances

2020-10-09 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32907: - Attachment: blockify_svc_perf_20201010.xlsx > adaptively blockify instances > --

[jira] [Commented] (SPARK-32691) Test org.apache.spark.DistributedSuite failed on arm64 jenkins

2020-09-27 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202800#comment-17202800 ] zhengruifeng commented on SPARK-32691: -- [~huangtianhua]  [~dongjoon] I just see tha

[jira] [Assigned] (SPARK-32974) FeatureHasher transform optimization

2020-09-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-32974: Assignee: zhengruifeng > FeatureHasher transform optimization > -

[jira] [Resolved] (SPARK-32974) FeatureHasher transform optimization

2020-09-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-32974. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29850 [https://gi

[jira] [Commented] (SPARK-32973) FeatureHasher does not check categoricalCols in inputCols

2020-09-24 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201852#comment-17201852 ] zhengruifeng commented on SPARK-32973: -- yes, "real" is ignored here. Since it has

[jira] [Updated] (SPARK-32973) FeatureHasher does not check categoricalCols in inputCols

2020-09-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32973: - Component/s: Documentation > FeatureHasher does not check categoricalCols in inputCols > ---

[jira] [Created] (SPARK-32974) FeatureHasher transform optimization

2020-09-23 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-32974: Summary: FeatureHasher transform optimization Key: SPARK-32974 URL: https://issues.apache.org/jira/browse/SPARK-32974 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-32973) FeatureHasher does not check categoricalCols in inputCols

2020-09-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200717#comment-17200717 ] zhengruifeng commented on SPARK-32973: -- ping [~srowen]  [~huaxingao]  [~mlnick] >

[jira] [Updated] (SPARK-32973) FeatureHasher does not check categoricalCols in inputCols

2020-09-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32973: - Description: doc related to {{categoricalCols}}: {code:java} Numeric columns to treat as categor

[jira] [Updated] (SPARK-32973) FeatureHasher does not check categoricalCols in inputCols

2020-09-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32973: - Description: doc related to {{categoricalCols}}: {code:java} Numeric columns to treat as categor

[jira] [Updated] (SPARK-32973) FeatureHasher does not check categoricalCols in inputCols

2020-09-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32973: - Description: doc related to {{categoricalCols}}: {code:java} Numeric columns to treat as categor

[jira] [Created] (SPARK-32973) FeatureHasher does not check categoricalCols in inputCols

2020-09-23 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-32973: Summary: FeatureHasher does not check categoricalCols in inputCols Key: SPARK-32973 URL: https://issues.apache.org/jira/browse/SPARK-32973 Project: Spark Iss

[jira] [Updated] (SPARK-32973) FeatureHasher does not check categoricalCols in inputCols

2020-09-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32973: - Description: in doc related to > FeatureHasher does not check categoricalCols in inputCols > ---

[jira] [Created] (SPARK-32907) adaptively blockify instances

2020-09-17 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-32907: Summary: adaptively blockify instances Key: SPARK-32907 URL: https://issues.apache.org/jira/browse/SPARK-32907 Project: Spark Issue Type: Sub-task

[jira] [Resolved] (SPARK-28958) pyspark.ml function parity

2020-09-10 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-28958. -- Resolution: Resolved > pyspark.ml function parity > -- > >

[jira] [Commented] (SPARK-29967) KMeans support instance weighting

2020-08-20 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181558#comment-17181558 ] zhengruifeng commented on SPARK-29967: -- [~YuQiang Ye] I open ticket SPARK-32676 for

[jira] [Updated] (SPARK-32676) Fix double caching in KMeans/BiKMeans

2020-08-20 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32676: - Description: In the .mllib side, if the storageLevel of input {{data}} is always ignored and ca

[jira] [Created] (SPARK-32676) Fix double caching in KMeans/BiKMeans

2020-08-20 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-32676: Summary: Fix double caching in KMeans/BiKMeans Key: SPARK-32676 URL: https://issues.apache.org/jira/browse/SPARK-32676 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-29967) KMeans support instance weighting

2020-08-19 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180937#comment-17180937 ] zhengruifeng commented on SPARK-29967: -- [~YuQiang Ye] can you provide more details

[jira] [Created] (SPARK-32457) logParam thresholds in DT/GBT/FM/LR/MLP

2020-07-27 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-32457: Summary: logParam thresholds in DT/GBT/FM/LR/MLP Key: SPARK-32457 URL: https://issues.apache.org/jira/browse/SPARK-32457 Project: Spark Issue Type: Improvem

[jira] [Created] (SPARK-32455) LogisticRegressionModel prediction optimization

2020-07-27 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-32455: Summary: LogisticRegressionModel prediction optimization Key: SPARK-32455 URL: https://issues.apache.org/jira/browse/SPARK-32455 Project: Spark Issue Type: I

[jira] [Created] (SPARK-32384) repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2020-07-22 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-32384: Summary: repartitionAndSortWithinPartitions avoid shuffle with same partitioner Key: SPARK-32384 URL: https://issues.apache.org/jira/browse/SPARK-32384 Project: Spark

[jira] [Created] (SPARK-32298) tree models prediction optimization

2020-07-13 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-32298: Summary: tree models prediction optimization Key: SPARK-32298 URL: https://issues.apache.org/jira/browse/SPARK-32298 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-32202) tree models auto infer compact integer type

2020-07-06 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-32202: Summary: tree models auto infer compact integer type Key: SPARK-32202 URL: https://issues.apache.org/jira/browse/SPARK-32202 Project: Spark Issue Type: Impro

[jira] [Created] (SPARK-32164) GeneralizedLinearRegressionSummary optimization

2020-07-02 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-32164: Summary: GeneralizedLinearRegressionSummary optimization Key: SPARK-32164 URL: https://issues.apache.org/jira/browse/SPARK-32164 Project: Spark Issue Type: I

[jira] [Commented] (SPARK-3181) Add Robust Regression Algorithm with Huber Estimator

2020-06-30 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148452#comment-17148452 ] zhengruifeng commented on SPARK-3181: - I am working on blockify+gemv/gemm for better

[jira] [Updated] (SPARK-32060) Huber loss Convergence

2020-06-30 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32060: - Attachment: (was: image-2020-06-28-18-05-28-867.png) > Huber loss Convergence >

[jira] [Comment Edited] (SPARK-32060) Huber loss Convergence

2020-06-30 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148412#comment-17148412 ] zhengruifeng edited comment on SPARK-32060 at 6/30/20, 8:20 AM: --

[jira] [Comment Edited] (SPARK-32060) Huber loss Convergence

2020-06-30 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148412#comment-17148412 ] zhengruifeng edited comment on SPARK-32060 at 6/30/20, 8:17 AM: --

[jira] [Commented] (SPARK-32060) Huber loss Convergence

2020-06-30 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148412#comment-17148412 ] zhengruifeng commented on SPARK-32060: -- I found that the optimization of Huber Loss

[jira] [Updated] (SPARK-32060) Huber loss Convergence

2020-06-30 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32060: - Description: |performace test in https://issues.apache.org/jira/browse/SPARK-31783, Huber loss

[jira] [Commented] (SPARK-32060) Huber loss Convergence

2020-06-28 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147295#comment-17147295 ] zhengruifeng commented on SPARK-32060: -- According to the convergence curves of diff

[jira] [Updated] (SPARK-32060) Huber loss Convergence

2020-06-28 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32060: - Attachment: image-2020-06-28-18-05-28-867.png > Huber loss Convergence > --

[jira] [Updated] (SPARK-32060) Huber loss Convergence

2020-06-28 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32060: - Attachment: huber.xlsx > Huber loss Convergence > -- > > Key

[jira] [Updated] (SPARK-32060) Huber loss Convergence

2020-06-28 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32060: - Attachment: (was: huber.xlsx) > Huber loss Convergence > -- > >

[jira] [Comment Edited] (SPARK-32060) Huber loss Convergence

2020-06-28 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147261#comment-17147261 ] zhengruifeng edited comment on SPARK-32060 at 6/28/20, 8:34 AM: --

[jira] [Comment Edited] (SPARK-32060) Huber loss Convergence

2020-06-28 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147261#comment-17147261 ] zhengruifeng edited comment on SPARK-32060 at 6/28/20, 8:34 AM: --

[jira] [Updated] (SPARK-32060) Huber loss Convergence

2020-06-28 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32060: - Attachment: huber.xlsx > Huber loss Convergence > -- > > Key

[jira] [Commented] (SPARK-32060) Huber loss Convergence

2020-06-28 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147261#comment-17147261 ] zhengruifeng commented on SPARK-32060: -- {code:java} import org.apache.spark.ml.regr

[jira] [Updated] (SPARK-32060) Huber loss Convergence

2020-06-28 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32060: - Description: |performace test in https://issues.apache.org/jira/browse/SPARK-31783, Huber loss

[jira] [Updated] (SPARK-32060) Huber loss Convergence

2020-06-28 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32060: - Attachment: (was: huber.xlsx) > Huber loss Convergence > -- > >

[jira] [Updated] (SPARK-32060) Huber loss Convergence

2020-06-28 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32060: - Attachment: huber.xlsx > Huber loss Convergence > -- > > Key

[jira] [Created] (SPARK-32061) potential regression if use memoryUsage instead of numRows

2020-06-22 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-32061: Summary: potential regression if use memoryUsage instead of numRows Key: SPARK-32061 URL: https://issues.apache.org/jira/browse/SPARK-32061 Project: Spark Is

[jira] [Updated] (SPARK-32060) Huber loss Convergence

2020-06-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32060: - Description: |performace test in https://issues.apache.org/jira/browse/SPARK-31783, Huber loss s

[jira] [Updated] (SPARK-32060) Huber loss Convergence

2020-06-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32060: - Parent: SPARK-30641 Issue Type: Sub-task (was: Bug) > Huber loss Convergence >

<    1   2   3   4   5   6   7   8   9   10   >