[jira] [Updated] (SPARK-15291) Remove redundant codes in SVD++

2016-05-18 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-15291: - Description: {code} val newVertices = g.vertices.mapValues(v => (v._1.toArray, v._2.toArray, v._3

[jira] [Updated] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2016-05-21 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-14174: - Description: The MiniBatchKMeans is a variant of the KMeans algorithm which uses mini-batches to

[jira] [Resolved] (SPARK-34047) save tree models in single partition

2021-01-20 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34047. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31090 [https://gi

[jira] [Assigned] (SPARK-34047) save tree models in single partition

2021-01-20 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34047: Assignee: zhengruifeng > save tree models in single partition > -

[jira] [Created] (SPARK-34189) w2v findSynonyms optimization

2021-01-21 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34189: Summary: w2v findSynonyms optimization Key: SPARK-34189 URL: https://issues.apache.org/jira/browse/SPARK-34189 Project: Spark Issue Type: Improvement

[jira] [Assigned] (SPARK-34189) w2v findSynonyms optimization

2021-01-21 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34189: Assignee: zhengruifeng > w2v findSynonyms optimization > - >

[jira] [Created] (SPARK-34220) BucketedRandomProjectionLSH transform opt

2021-01-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34220: Summary: BucketedRandomProjectionLSH transform opt Key: SPARK-34220 URL: https://issues.apache.org/jira/browse/SPARK-34220 Project: Spark Issue Type: Improve

[jira] [Assigned] (SPARK-34220) BucketedRandomProjectionLSH transform opt

2021-01-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34220: Assignee: zhengruifeng > BucketedRandomProjectionLSH transform opt >

[jira] [Resolved] (SPARK-34220) BucketedRandomProjectionLSH transform opt

2021-01-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34220. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31313 [https://gi

[jira] [Assigned] (SPARK-34189) w2v findSynonyms optimization

2021-01-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34189: Assignee: zhengruifeng (was: Apache Spark) > w2v findSynonyms optimization > ---

[jira] [Resolved] (SPARK-34189) w2v findSynonyms optimization

2021-01-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34189. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31276 [https://gi

[jira] [Created] (SPARK-34256) VectorSlicer refine numFeatures checking and toString method

2021-01-26 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34256: Summary: VectorSlicer refine numFeatures checking and toString method Key: SPARK-34256 URL: https://issues.apache.org/jira/browse/SPARK-34256 Project: Spark

[jira] [Created] (SPARK-34291) LSH hashDistance optimization

2021-01-29 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34291: Summary: LSH hashDistance optimization Key: SPARK-34291 URL: https://issues.apache.org/jira/browse/SPARK-34291 Project: Spark Issue Type: Improvement

[jira] [Assigned] (SPARK-34256) VectorSlicer refine numFeatures checking and toString method

2021-01-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34256: Assignee: zhengruifeng > VectorSlicer refine numFeatures checking and toString method > -

[jira] [Resolved] (SPARK-34256) VectorSlicer refine numFeatures checking and toString method

2021-01-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34256. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31354 [https://gi

[jira] [Created] (SPARK-34307) TakeOrderedAndProjectExec avoid shuffle if input rdd has single partition

2021-01-31 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34307: Summary: TakeOrderedAndProjectExec avoid shuffle if input rdd has single partition Key: SPARK-34307 URL: https://issues.apache.org/jira/browse/SPARK-34307 Project: Sp

[jira] [Created] (SPARK-34353) CollectLimitExec avoid shuffle if input rdd has single partition

2021-02-03 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34353: Summary: CollectLimitExec avoid shuffle if input rdd has single partition Key: SPARK-34353 URL: https://issues.apache.org/jira/browse/SPARK-34353 Project: Spark

[jira] [Created] (SPARK-34356) OVR transform avoid potential column conflict

2021-02-04 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34356: Summary: OVR transform avoid potential column conflict Key: SPARK-34356 URL: https://issues.apache.org/jira/browse/SPARK-34356 Project: Spark Issue Type: Imp

[jira] [Assigned] (SPARK-34356) OVR transform avoid potential column conflict

2021-02-04 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34356: Assignee: zhengruifeng > OVR transform avoid potential column conflict >

[jira] [Updated] (SPARK-34356) OVR transform fix potential column conflict

2021-02-04 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-34356: - Summary: OVR transform fix potential column conflict (was: OVR transform avoid potential column

[jira] [Created] (SPARK-34470) VectorSlicer use ordering if possible

2021-02-18 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34470: Summary: VectorSlicer use ordering if possible Key: SPARK-34470 URL: https://issues.apache.org/jira/browse/SPARK-34470 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-24 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290642#comment-17290642 ] zhengruifeng commented on SPARK-34448: -- [~srowen] Thanks for pinging me, I am going

[jira] [Commented] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291378#comment-17291378 ] zhengruifeng commented on SPARK-34448: -- [~srowen] [~weichenxu123]  [~ykerzhner] M

[jira] [Comment Edited] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291378#comment-17291378 ] zhengruifeng edited comment on SPARK-34448 at 2/26/21, 4:29 AM: --

[jira] [Comment Edited] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291378#comment-17291378 ] zhengruifeng edited comment on SPARK-34448 at 2/26/21, 4:30 AM: --

[jira] [Comment Edited] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291378#comment-17291378 ] zhengruifeng edited comment on SPARK-34448 at 2/26/21, 4:33 AM: --

[jira] [Comment Edited] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291384#comment-17291384 ] zhengruifeng edited comment on SPARK-34448 at 2/26/21, 4:41 AM: --

[jira] [Commented] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291384#comment-17291384 ] zhengruifeng commented on SPARK-34448: -- My test code and log is here > Binary logi

[jira] [Comment Edited] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291378#comment-17291378 ] zhengruifeng edited comment on SPARK-34448 at 2/26/21, 4:56 AM: --

[jira] [Comment Edited] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291378#comment-17291378 ] zhengruifeng edited comment on SPARK-34448 at 2/26/21, 4:58 AM: --

[jira] [Commented] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291559#comment-17291559 ] zhengruifeng commented on SPARK-34448: -- 1, I just make a simple impl(https://githu

[jira] [Commented] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291562#comment-17291562 ] zhengruifeng commented on SPARK-34448: -- I am not sure to: 1, center the vector in e

[jira] [Created] (SPARK-34765) Linear Models standardization optimization

2021-03-16 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34765: Summary: Linear Models standardization optimization Key: SPARK-34765 URL: https://issues.apache.org/jira/browse/SPARK-34765 Project: Spark Issue Type: Improv

[jira] [Updated] (SPARK-34765) Linear Models standardization optimization

2021-03-16 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-34765: - Issue Type: Umbrella (was: Improvement) > Linear Models standardization optimization >

[jira] [Updated] (SPARK-34765) Linear Models standardization optimization

2021-03-16 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-34765: - Description: Existing impl of standardization in linear models does *NOT* center the vectors by

[jira] [Resolved] (SPARK-32060) Huber loss Convergence

2021-03-17 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-32060. -- Resolution: Resolved > Huber loss Convergence > -- > > Key

[jira] [Resolved] (SPARK-31783) Performance test on dense and sparse datasets

2021-03-17 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-31783. -- Resolution: Resolved > Performance test on dense and sparse datasets > ---

[jira] [Resolved] (SPARK-31714) Performance test on java vectorization vs dot vs gemv vs gemm

2021-03-17 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-31714. -- Resolution: Resolved > Performance test on java vectorization vs dot vs gemv vs gemm > ---

[jira] [Resolved] (SPARK-32061) potential regression if use memoryUsage instead of numRows

2021-03-18 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-32061. -- Resolution: Resolved > potential regression if use memoryUsage instead of numRows > --

[jira] [Assigned] (SPARK-31661) Document usage of blockSize

2021-03-18 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-31661: Assignee: zhengruifeng > Document usage of blockSize > --- > >

[jira] [Assigned] (SPARK-31976) use MemoryUsage to control the size of block

2021-03-18 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-31976: Assignee: zhengruifeng > use MemoryUsage to control the size of block > -

[jira] [Updated] (SPARK-34765) Linear Models standardization optimization

2021-03-18 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-34765: - Issue Type: Improvement (was: Umbrella) > Linear Models standardization optimization >

[jira] [Updated] (SPARK-34765) Linear Models standardization optimization

2021-03-18 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-34765: - Parent: SPARK-30641 Issue Type: Sub-task (was: Improvement) > Linear Models standardiza

[jira] [Created] (SPARK-34797) Refactor Logistic Aggregator

2021-03-18 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34797: Summary: Refactor Logistic Aggregator Key: SPARK-34797 URL: https://issues.apache.org/jira/browse/SPARK-34797 Project: Spark Issue Type: Sub-task C

[jira] [Updated] (SPARK-34797) Refactor Logistic Aggregator

2021-03-18 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-34797: - Description: 1, add BinaryLogisticBlockAggregator and MultinomialLogisticBlockAggregator and re

[jira] [Updated] (SPARK-34797) Refactor Logistic Aggregator - support virtual centering

2021-03-18 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-34797: - Summary: Refactor Logistic Aggregator - support virtual centering (was: Refactor Logistic Aggre

[jira] [Updated] (SPARK-30641) ML algs blockify input vectors

2021-03-18 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Affects Version/s: 3.2.0 > ML algs blockify input vectors > -- > >

[jira] [Assigned] (SPARK-34470) VectorSlicer use ordering if possible

2021-03-21 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34470: Assignee: zhengruifeng > VectorSlicer use ordering if possible >

[jira] [Resolved] (SPARK-34470) VectorSlicer use ordering if possible

2021-03-21 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34470. -- Resolution: Resolved > VectorSlicer use ordering if possible > ---

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Summary: Project Matrix: Linear Models revisit and refactor (was: ML algs blockify input vector

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Assigned] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-30641: Assignee: (was: zhengruifeng) > Project Matrix: Linear Models revisit and refactor >

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-03-24 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some wo

[jira] [Created] (SPARK-34858) Binary Logistic Regression with intercept support centering

2021-03-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34858: Summary: Binary Logistic Regression with intercept support centering Key: SPARK-34858 URL: https://issues.apache.org/jira/browse/SPARK-34858 Project: Spark

[jira] [Created] (SPARK-34860) Multinomial Logistic Regression with intercept support centering

2021-03-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34860: Summary: Multinomial Logistic Regression with intercept support centering Key: SPARK-34860 URL: https://issues.apache.org/jira/browse/SPARK-34860 Project: Spark

[jira] [Created] (SPARK-36956) model prediction in .mllib avoid conversion to breeze vector

2021-10-08 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-36956: Summary: model prediction in .mllib avoid conversion to breeze vector Key: SPARK-36956 URL: https://issues.apache.org/jira/browse/SPARK-36956 Project: Spark

[jira] [Created] (SPARK-36963) Add max_by/min_by to sql.functions

2021-10-08 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-36963: Summary: Add max_by/min_by to sql.functions Key: SPARK-36963 URL: https://issues.apache.org/jira/browse/SPARK-36963 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2021-10-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Description: in JD, we found that more than 80% usage of window function follows this pattern:

[jira] [Created] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2021-10-22 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-37099: Summary: Impl a rank-based filter to optimize top-k computation Key: SPARK-37099 URL: https://issues.apache.org/jira/browse/SPARK-37099 Project: Spark Issue

[jira] [Updated] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2021-10-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Description: in JD, we found that more than 80% usage of window function follows this pattern:

[jira] [Updated] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2021-10-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Description: in JD, we found that more than 80% usage of window function follows this pattern:

[jira] [Updated] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2021-10-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Attachment: skewed_window.png > Impl a rank-based filter to optimize top-k computation > ---

[jira] [Updated] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2021-11-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Description: in JD, we found that more than 90% usage of window function follows this pattern:

[jira] [Updated] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2021-11-25 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Description: in JD, we found that more than 90% usage of window function follows this pattern:

[jira] [Created] (SPARK-37597) Deduplicate the right side of left-semi join and left-anti join

2021-12-09 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-37597: Summary: Deduplicate the right side of left-semi join and left-anti join Key: SPARK-37597 URL: https://issues.apache.org/jira/browse/SPARK-37597 Project: Spark

[jira] [Resolved] (SPARK-37597) Deduplicate the right side of left-semi join and left-anti join

2021-12-09 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-37597. -- Resolution: Duplicate > Deduplicate the right side of left-semi join and left-anti join >

[jira] [Updated] (SPARK-38588) Validate input dataset of ml.classification

2022-03-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38588: - Summary: Validate input dataset of ml.classification (was: Validate input dataset of LinearSVC)

[jira] [Updated] (SPARK-38588) Validate input dataset of ml.classification

2022-03-24 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38588: - Fix Version/s: 3.4.0 > Validate input dataset of ml.classification > ---

[jira] [Resolved] (SPARK-38588) Validate input dataset of ml.classification

2022-03-24 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-38588. -- Resolution: Resolved > Validate input dataset of ml.classification > -

[jira] [Created] (SPARK-38643) Validate input dataset of ml.regression

2022-03-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38643: Summary: Validate input dataset of ml.regression Key: SPARK-38643 URL: https://issues.apache.org/jira/browse/SPARK-38643 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-38669) Validate input dataset of ml.clustering

2022-03-27 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38669: Summary: Validate input dataset of ml.clustering Key: SPARK-38669 URL: https://issues.apache.org/jira/browse/SPARK-38669 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2022-03-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Description: in JD, we found that more than 90% usage of window function follows this pattern:

[jira] [Updated] (SPARK-37099) Introduce a rank-based filter to optimize top-k computation

2022-03-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Summary: Introduce a rank-based filter to optimize top-k computation (was: Impl a rank-based fi

[jira] [Updated] (SPARK-37099) Introduce a rank-based filter to optimize top-k computation

2022-03-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Affects Version/s: 3.4.0 (was: 3.3.0) > Introduce a rank-based filter

[jira] [Updated] (SPARK-36638) Generalize OptimizeSkewedJoin

2022-03-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-36638: - Affects Version/s: 3.4.0 (was: 3.3.0) > Generalize OptimizeSkewedJoin

[jira] [Created] (SPARK-38774) impl Series.autocorr

2022-04-02 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38774: Summary: impl Series.autocorr Key: SPARK-38774 URL: https://issues.apache.org/jira/browse/SPARK-38774 Project: Spark Issue Type: Sub-task Component

[jira] [Created] (SPARK-38775) cleanup validation functions

2022-04-02 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38775: Summary: cleanup validation functions Key: SPARK-38775 URL: https://issues.apache.org/jira/browse/SPARK-38775 Project: Spark Issue Type: Sub-task C

[jira] [Assigned] (SPARK-38775) cleanup validation functions

2022-04-02 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-38775: Assignee: zhengruifeng > cleanup validation functions > > >

[jira] [Created] (SPARK-38785) impl Series.ewm and DataFrame.ewm

2022-04-04 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38785: Summary: impl Series.ewm and DataFrame.ewm Key: SPARK-38785 URL: https://issues.apache.org/jira/browse/SPARK-38785 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-38785) impl Series.ewm and DataFrame.ewm

2022-04-04 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516846#comment-17516846 ] zhengruifeng commented on SPARK-38785: -- h1. Pandas API on Spark: [EWM|https://pand

[jira] [Created] (SPARK-38844) impl Series.interpolate and DataFrame.interpolate

2022-04-09 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38844: Summary: impl Series.interpolate and DataFrame.interpolate Key: SPARK-38844 URL: https://issues.apache.org/jira/browse/SPARK-38844 Project: Spark Issue Type:

[jira] [Created] (SPARK-38907) Impl DataFrame.corrwith

2022-04-14 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38907: Summary: Impl DataFrame.corrwith Key: SPARK-38907 URL: https://issues.apache.org/jira/browse/SPARK-38907 Project: Spark Issue Type: Sub-task Compon

[jira] [Created] (SPARK-38937) interpolate support param `limit_direction`

2022-04-18 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38937: Summary: interpolate support param `limit_direction` Key: SPARK-38937 URL: https://issues.apache.org/jira/browse/SPARK-38937 Project: Spark Issue Type: Impro

[jira] [Created] (SPARK-38943) EWM support ignore_na

2022-04-18 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38943: Summary: EWM support ignore_na Key: SPARK-38943 URL: https://issues.apache.org/jira/browse/SPARK-38943 Project: Spark Issue Type: Improvement Compo

[jira] [Created] (SPARK-38993) Impl DataFrame.boxplot and DataFrame.plot.box

2022-04-21 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38993: Summary: Impl DataFrame.boxplot and DataFrame.plot.box Key: SPARK-38993 URL: https://issues.apache.org/jira/browse/SPARK-38993 Project: Spark Issue Type: Sub

[jira] [Created] (SPARK-39081) Impl DataFrame.resample and Series.resample

2022-04-30 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39081: Summary: Impl DataFrame.resample and Series.resample Key: SPARK-39081 URL: https://issues.apache.org/jira/browse/SPARK-39081 Project: Spark Issue Type: Sub-t

<    1   2   3   4   5   6   7   8   9   10   >