[jira] [Created] (SPARK-39598) Make *cache*, *catalog* in the python side support 3-layer-namespace

2022-06-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39598: Summary: Make *cache*, *catalog* in the python side support 3-layer-namespace Key: SPARK-39598 URL: https://issues.apache.org/jira/browse/SPARK-39598 Project: Spark

[jira] [Created] (SPARK-39597) Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

2022-06-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39597: Summary: Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace Key: SPARK-39597 URL: https://issues.apache.org/jira/browse/SPARK-39597

[jira] [Created] (SPARK-39579) Make ListFunctions API compatible

2022-06-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39579: Summary: Make ListFunctions API compatible Key: SPARK-39579 URL: https://issues.apache.org/jira/browse/SPARK-39579 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-39555) Make createTable and listTables in the python side support 3-layer-namespace

2022-06-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-39555: - Description: Corresponding changes in the python side of SPARK-39236 (Make CreateTable API and

[jira] [Created] (SPARK-39555) Make createTable and listTables in the python side support 3-layer-namespace

2022-06-22 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39555: Summary: Make createTable and listTables in the python side support 3-layer-namespace Key: SPARK-39555 URL: https://issues.apache.org/jira/browse/SPARK-39555

[jira] [Updated] (SPARK-39555) Make createTable and listTables in the python side support 3-layer-namespace

2022-06-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-39555: - Description: Corresponding changes in the python side to make > Make createTable and

[jira] [Updated] (SPARK-39533) Deprecate scoreLabelsWeight in BinaryClassificationMetrics

2022-06-21 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-39533: - Summary: Deprecate scoreLabelsWeight in BinaryClassificationMetrics (was: Remove

[jira] [Updated] (SPARK-39533) Deprecate scoreLabelsWeight in BinaryClassificationMetrics

2022-06-21 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-39533: - Description: scoreLabelsWeight in BinaryClassificationMetrics is a public variable, but it

[jira] [Updated] (SPARK-39534) Series.argmax only needs single pass

2022-06-20 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-39534: - Summary: Series.argmax only needs single pass (was: Series.argmax only need one pass) >

[jira] [Created] (SPARK-39534) Series.argmax only need one pass

2022-06-20 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39534: Summary: Series.argmax only need one pass Key: SPARK-39534 URL: https://issues.apache.org/jira/browse/SPARK-39534 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-39533) Remove scoreLabelsWeight in BinaryClassificationMetrics

2022-06-20 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39533: Summary: Remove scoreLabelsWeight in BinaryClassificationMetrics Key: SPARK-39533 URL: https://issues.apache.org/jira/browse/SPARK-39533 Project: Spark

[jira] [Updated] (SPARK-39510) Leverage the natural partitioning and ordering of MonotonicallyIncreasingID

2022-06-18 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-39510: - Summary: Leverage the natural partitioning and ordering of MonotonicallyIncreasingID (was:

[jira] [Updated] (SPARK-39510) leverage the natural partitioning and ordering of MonotonicallyIncreasingID

2022-06-18 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-39510: - Description: In Pandas-API-on-Spark: 1, *MonotonicallyIncreasingID* and

[jira] [Updated] (SPARK-39510) leverage the natural partitioning and ordering of MonotonicallyIncreasingID

2022-06-18 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-39510: - Description: In Pandas-API-on-Spark: 1, *MonotonicallyIncreasingID* and

[jira] [Created] (SPARK-39510) leverage the natural partitioning and ordering of MonotonicallyIncreasingID

2022-06-18 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39510: Summary: leverage the natural partitioning and ordering of MonotonicallyIncreasingID Key: SPARK-39510 URL: https://issues.apache.org/jira/browse/SPARK-39510 Project:

[jira] [Assigned] (SPARK-39284) Implement Groupby.mad

2022-06-04 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-39284: Assignee: zhengruifeng > Implement Groupby.mad > - > >

[jira] [Resolved] (SPARK-39284) Implement Groupby.mad

2022-06-04 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-39284. -- Resolution: Resolved Resolved by https://github.com/apache/spark/pull/36660 > Implement

[jira] [Assigned] (SPARK-39228) Implement `skipna` of `Series.argmax`

2022-05-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-39228: Assignee: Xinrong Meng > Implement `skipna` of `Series.argmax` >

[jira] [Resolved] (SPARK-39228) Implement `skipna` of `Series.argmax`

2022-05-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-39228. -- Resolution: Resolved resolved by https://github.com/apache/spark/pull/36599 > Implement

[jira] [Resolved] (SPARK-39300) Move pandasSkewness and pandasKurtosis into pandas.spark.functions

2022-05-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-39300. -- Resolution: Resolved > Move pandasSkewness and pandasKurtosis into pandas.spark.functions >

[jira] [Resolved] (SPARK-39268) AttachDistributedSequenceExec do not checkpoint childRDD with single partition

2022-05-26 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-39268. -- Resolution: Resolved > AttachDistributedSequenceExec do not checkpoint childRDD with single

[jira] [Created] (SPARK-39300) Move pandasSkewness and pandasKurtosis into pandas.spark.functions

2022-05-25 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39300: Summary: Move pandasSkewness and pandasKurtosis into pandas.spark.functions Key: SPARK-39300 URL: https://issues.apache.org/jira/browse/SPARK-39300 Project: Spark

[jira] [Created] (SPARK-39299) Series.autocorr use SQL.corr to avoid conversion to vector

2022-05-25 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39299: Summary: Series.autocorr use SQL.corr to avoid conversion to vector Key: SPARK-39299 URL: https://issues.apache.org/jira/browse/SPARK-39299 Project: Spark

[jira] [Created] (SPARK-39284) Implement Groupby.mad

2022-05-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39284: Summary: Implement Groupby.mad Key: SPARK-39284 URL: https://issues.apache.org/jira/browse/SPARK-39284 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-39268) AttachDistributedSequenceExec do not checkpoint childRDD with single partition

2022-05-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39268: Summary: AttachDistributedSequenceExec do not checkpoint childRDD with single partition Key: SPARK-39268 URL: https://issues.apache.org/jira/browse/SPARK-39268

[jira] [Resolved] (SPARK-39129) impl Groupby.ewm

2022-05-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-39129. -- Resolution: Resolved Resolved by https://github.com/apache/spark/pull/36486 > impl

[jira] [Commented] (SPARK-39246) Implement Groupby.skew

2022-05-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17540881#comment-17540881 ] zhengruifeng commented on SPARK-39246: -- Thanks [~Qin Yao] ! > Implement Groupby.skew >

[jira] [Updated] (SPARK-39092) Propagate Empty Partitions

2022-05-23 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-39092: - Attachment: PropagateEmptyPartitions.pdf > Propagate Empty Partitions >

[jira] [Assigned] (SPARK-39129) impl Groupby.ewm

2022-05-21 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-39129: Assignee: zhengruifeng > impl Groupby.ewm > > > Key:

[jira] [Created] (SPARK-39246) Implement Groupby.skew

2022-05-21 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39246: Summary: Implement Groupby.skew Key: SPARK-39246 URL: https://issues.apache.org/jira/browse/SPARK-39246 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-39223) implement skew and kurt in Rolling/RollingGroupby/Expanding/ExpandingGroupby

2022-05-18 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39223: Summary: implement skew and kurt in Rolling/RollingGroupby/Expanding/ExpandingGroupby Key: SPARK-39223 URL: https://issues.apache.org/jira/browse/SPARK-39223

[jira] [Created] (SPARK-39192) make pandas-on-spark's kurt consistent with pandas

2022-05-16 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39192: Summary: make pandas-on-spark's kurt consistent with pandas Key: SPARK-39192 URL: https://issues.apache.org/jira/browse/SPARK-39192 Project: Spark Issue

[jira] [Created] (SPARK-39189) interpolate supports limit_area

2022-05-15 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39189: Summary: interpolate supports limit_area Key: SPARK-39189 URL: https://issues.apache.org/jira/browse/SPARK-39189 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-39186) make skew consistent with pandas

2022-05-14 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39186: Summary: make skew consistent with pandas Key: SPARK-39186 URL: https://issues.apache.org/jira/browse/SPARK-39186 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-39129) impl Groupby.ewm

2022-05-09 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39129: Summary: impl Groupby.ewm Key: SPARK-39129 URL: https://issues.apache.org/jira/browse/SPARK-39129 Project: Spark Issue Type: Sub-task Components:

[jira] [Resolved] (SPARK-39114) ml.optim.aggregator avoid re-allocating buffers

2022-05-07 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-39114. -- Fix Version/s: 3.4.0 Resolution: Resolved > ml.optim.aggregator avoid re-allocating

[jira] [Created] (SPARK-39114) ml.optim.aggregator avoid re-allocating buffers

2022-05-06 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39114: Summary: ml.optim.aggregator avoid re-allocating buffers Key: SPARK-39114 URL: https://issues.apache.org/jira/browse/SPARK-39114 Project: Spark Issue Type:

[jira] [Commented] (SPARK-39058) Add `getInputSignature` and `getOutputSignature` APIs for spark ML models/transformers

2022-05-03 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531100#comment-17531100 ] zhengruifeng commented on SPARK-39058: -- [~weichenxu123]  I can help reivew. BTW, is there some

[jira] [Created] (SPARK-39092) Propagate Empty Partitions

2022-05-03 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39092: Summary: Propagate Empty Partitions Key: SPARK-39092 URL: https://issues.apache.org/jira/browse/SPARK-39092 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-30661) KMeans blockify input vectors

2022-05-03 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30661: - Affects Version/s: 3.4.0 (was: 3.0.0) > KMeans blockify input

[jira] [Updated] (SPARK-30661) KMeans blockify input vectors

2022-05-03 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30661: - Priority: Major (was: Minor) > KMeans blockify input vectors > - >

[jira] [Created] (SPARK-39081) Impl DataFrame.resample and Series.resample

2022-04-30 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39081: Summary: Impl DataFrame.resample and Series.resample Key: SPARK-39081 URL: https://issues.apache.org/jira/browse/SPARK-39081 Project: Spark Issue Type:

[jira] [Created] (SPARK-38993) Impl DataFrame.boxplot and DataFrame.plot.box

2022-04-21 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38993: Summary: Impl DataFrame.boxplot and DataFrame.plot.box Key: SPARK-38993 URL: https://issues.apache.org/jira/browse/SPARK-38993 Project: Spark Issue Type:

[jira] [Created] (SPARK-38943) EWM support ignore_na

2022-04-19 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38943: Summary: EWM support ignore_na Key: SPARK-38943 URL: https://issues.apache.org/jira/browse/SPARK-38943 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-38937) interpolate support param `limit_direction`

2022-04-18 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38937: Summary: interpolate support param `limit_direction` Key: SPARK-38937 URL: https://issues.apache.org/jira/browse/SPARK-38937 Project: Spark Issue Type:

[jira] [Created] (SPARK-38907) Impl DataFrame.corrwith

2022-04-14 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38907: Summary: Impl DataFrame.corrwith Key: SPARK-38907 URL: https://issues.apache.org/jira/browse/SPARK-38907 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-38844) impl Series.interpolate and DataFrame.interpolate

2022-04-09 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38844: Summary: impl Series.interpolate and DataFrame.interpolate Key: SPARK-38844 URL: https://issues.apache.org/jira/browse/SPARK-38844 Project: Spark Issue

[jira] [Commented] (SPARK-38785) impl Series.ewm and DataFrame.ewm

2022-04-04 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516846#comment-17516846 ] zhengruifeng commented on SPARK-38785: -- h1. Pandas API on Spark:

[jira] [Created] (SPARK-38785) impl Series.ewm and DataFrame.ewm

2022-04-04 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38785: Summary: impl Series.ewm and DataFrame.ewm Key: SPARK-38785 URL: https://issues.apache.org/jira/browse/SPARK-38785 Project: Spark Issue Type: Sub-task

[jira] [Assigned] (SPARK-38775) cleanup validation functions

2022-04-02 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-38775: Assignee: zhengruifeng > cleanup validation functions > > >

[jira] [Created] (SPARK-38775) cleanup validation functions

2022-04-02 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38775: Summary: cleanup validation functions Key: SPARK-38775 URL: https://issues.apache.org/jira/browse/SPARK-38775 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-38774) impl Series.autocorr

2022-04-02 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38774: Summary: impl Series.autocorr Key: SPARK-38774 URL: https://issues.apache.org/jira/browse/SPARK-38774 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-37099) Introduce a rank-based filter to optimize top-k computation

2022-03-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Affects Version/s: 3.4.0 (was: 3.3.0) > Introduce a rank-based

[jira] [Updated] (SPARK-36638) Generalize OptimizeSkewedJoin

2022-03-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-36638: - Affects Version/s: 3.4.0 (was: 3.3.0) > Generalize

[jira] [Updated] (SPARK-37099) Introduce a rank-based filter to optimize top-k computation

2022-03-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Summary: Introduce a rank-based filter to optimize top-k computation (was: Impl a rank-based

[jira] [Updated] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2022-03-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Description: in JD, we found that more than 90% usage of window function follows this pattern:

[jira] [Created] (SPARK-38669) Validate input dataset of ml.clustering

2022-03-27 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38669: Summary: Validate input dataset of ml.clustering Key: SPARK-38669 URL: https://issues.apache.org/jira/browse/SPARK-38669 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-38643) Validate input dataset of ml.regression

2022-03-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38643: Summary: Validate input dataset of ml.regression Key: SPARK-38643 URL: https://issues.apache.org/jira/browse/SPARK-38643 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-38588) Validate input dataset of ml.classification

2022-03-24 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38588: - Fix Version/s: 3.4.0 > Validate input dataset of ml.classification >

[jira] [Resolved] (SPARK-38588) Validate input dataset of ml.classification

2022-03-24 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-38588. -- Resolution: Resolved > Validate input dataset of ml.classification >

[jira] [Updated] (SPARK-38588) Validate input dataset of ml.classification

2022-03-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38588: - Summary: Validate input dataset of ml.classification (was: Validate input dataset of

[jira] [Updated] (SPARK-38584) Unify the data validation

2022-03-17 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38584: - Description: 1, input vector validation is missing in most algorithms, when the input dataset

[jira] [Updated] (SPARK-38584) Unify the data validation

2022-03-17 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38584: - Description: 1, input vector validation is missing in most algorithms, when the input dataset

[jira] [Created] (SPARK-38588) Validate input dataset of LinearSVC

2022-03-17 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38588: Summary: Validate input dataset of LinearSVC Key: SPARK-38588 URL: https://issues.apache.org/jira/browse/SPARK-38588 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-38584) Unify the data validation

2022-03-17 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38584: - Description: 1, input vector validation is missing in most algorithms, when the input dataset

[jira] [Assigned] (SPARK-38584) Unify the data validation

2022-03-17 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-38584: Assignee: zhengruifeng > Unify the data validation > - > >

[jira] [Updated] (SPARK-38584) Unify the data validation

2022-03-17 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38584: - Description: 1, input vector validation is missing in most algorithms, when the input dataset

[jira] [Created] (SPARK-38584) Unify the data validation

2022-03-17 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38584: Summary: Unify the data validation Key: SPARK-38584 URL: https://issues.apache.org/jira/browse/SPARK-38584 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-38286) Union's maxRows and maxRowsPerPartition may overflow

2022-02-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38286: - Summary: Union's maxRows and maxRowsPerPartition may overflow (was: check Union's maxRows and

[jira] [Updated] (SPARK-38286) check Union's maxRows and maxRowsPerPartition

2022-02-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38286: - Description: {code:java} scala> val df1 = spark.range(0, Long.MaxValue, 1, 1) df1:

[jira] [Created] (SPARK-38286) check Union's maxRows and maxRowsPerPartition

2022-02-22 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38286: Summary: check Union's maxRows and maxRowsPerPartition Key: SPARK-38286 URL: https://issues.apache.org/jira/browse/SPARK-38286 Project: Spark Issue Type:

[jira] [Updated] (SPARK-38271) PoissonSampler may output more rows than MaxRows

2022-02-22 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38271: - Summary: PoissonSampler may output more rows than MaxRows (was: PoissonSampler may generate

[jira] [Updated] (SPARK-38271) PoissonSampler may generate more rows than MaxRows

2022-02-21 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38271: - Affects Version/s: 3.2.1 3.1.2 3.0.3 >

[jira] [Updated] (SPARK-38271) PoissonSampler may generate more rows than MaxRows

2022-02-21 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-38271: - Description: {code:java} scala> val df = spark.range(0, 1000) df:

[jira] [Created] (SPARK-38271) PoissonSampler may generate more rows than MaxRows

2022-02-21 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-38271: Summary: PoissonSampler may generate more rows than MaxRows Key: SPARK-38271 URL: https://issues.apache.org/jira/browse/SPARK-38271 Project: Spark Issue

[jira] [Commented] (SPARK-37913) Null Pointer Exception when Loading ML Pipeline Model with Custom Transformer

2022-02-15 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492517#comment-17492517 ] zhengruifeng commented on SPARK-37913: -- does the `MyTransformer` in the example works? > Null

[jira] [Commented] (SPARK-38037) Spark MLlib FPGrowth not working with 40+ items in Frequent Item set

2022-02-10 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490654#comment-17490654 ] zhengruifeng commented on SPARK-38037: -- I can reproduce it by: {code:java} import

[jira] [Commented] (SPARK-38139) ml.recommendation.ALS doctests failures

2022-02-09 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489521#comment-17489521 ] zhengruifeng commented on SPARK-38139: -- I think it is ok to adjust the tol in this case >

[jira] [Commented] (SPARK-34160) pyspark.ml.stat.Summarizer should allow sparse vector results

2022-02-09 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489518#comment-17489518 ] zhengruifeng commented on SPARK-34160: -- you can get a sparse vector by calling

[jira] [Resolved] (SPARK-34160) pyspark.ml.stat.Summarizer should allow sparse vector results

2022-02-09 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34160. -- Resolution: Not A Problem > pyspark.ml.stat.Summarizer should allow sparse vector results >

[jira] [Commented] (SPARK-34452) OneVsRest with GBTClassifier throws InternalCompilerException in 3.1.0

2022-02-09 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489517#comment-17489517 ] zhengruifeng commented on SPARK-34452: -- I can not reproduce this issue in 3.1.2, could you please

[jira] [Commented] (SPARK-37285) Add Weight of Evidence and Information value to ml.feature

2022-02-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489301#comment-17489301 ] zhengruifeng commented on SPARK-37285: -- were these metrics or algorithms implemented in

[jira] [Commented] (SPARK-36553) KMeans fails with NegativeArraySizeException for K = 50000 after issue #27758 was introduced

2022-02-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489278#comment-17489278 ] zhengruifeng commented on SPARK-36553: -- it is a overflow:   {code:java} scala> val k = 5 val

[jira] [Commented] (SPARK-30661) KMeans blockify input vectors

2022-02-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489271#comment-17489271 ] zhengruifeng commented on SPARK-30661: -- ok, I will skip .mllib calling .ml here. We may re-org the

[jira] [Comment Edited] (SPARK-31007) KMeans optimization based on triangle-inequality

2022-02-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489270#comment-17489270 ] zhengruifeng edited comment on SPARK-31007 at 2/9/22, 6:05 AM: ---   this

[jira] [Commented] (SPARK-31007) KMeans optimization based on triangle-inequality

2022-02-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489270#comment-17489270 ] zhengruifeng commented on SPARK-31007: --   this case is not OOM, but the overflow:   {code:java}

[jira] [Commented] (SPARK-36714) bugs in MIniLSH

2022-02-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489258#comment-17489258 ] zhengruifeng commented on SPARK-36714: -- [~sheng_1992] Since you had investigate this issue, feel

[jira] [Commented] (SPARK-36714) bugs in MIniLSH

2022-02-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489232#comment-17489232 ] zhengruifeng commented on SPARK-36714: -- could you please provide a simple script to reproduce this

[jira] [Commented] (SPARK-31007) KMeans optimization based on triangle-inequality

2022-02-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489230#comment-17489230 ] zhengruifeng commented on SPARK-31007: -- [~srowen]  This optimization needs an array of size  val

[jira] [Commented] (SPARK-38037) Spark MLlib FPGrowth not working with 40+ items in Frequent Item set

2022-02-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489225#comment-17489225 ] zhengruifeng commented on SPARK-38037: -- could you please provide a simple script to reproduce this

[jira] [Assigned] (SPARK-33882) Add a vectorized BLAS implementation

2022-02-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-33882: Assignee: Ludovic Henry (was: zhengruifeng) > Add a vectorized BLAS implementation >

[jira] [Assigned] (SPARK-33882) Add a vectorized BLAS implementation

2022-02-08 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-33882: Assignee: zhengruifeng (was: Ludovic Henry) > Add a vectorized BLAS implementation >

[jira] [Commented] (SPARK-30661) KMeans blockify input vectors

2022-02-07 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488611#comment-17488611 ] zhengruifeng commented on SPARK-30661: -- since the input datasets of kmeans are likely dense, so I

[jira] [Comment Edited] (SPARK-30661) KMeans blockify input vectors

2022-01-20 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17479793#comment-17479793 ] zhengruifeng edited comment on SPARK-30661 at 1/21/22, 2:54 AM:

[jira] [Updated] (SPARK-30661) KMeans blockify input vectors

2022-01-20 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30661: - Attachment: blockify_kmeans.png > KMeans blockify input vectors > -

[jira] [Commented] (SPARK-30661) KMeans blockify input vectors

2022-01-20 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17479793#comment-17479793 ] zhengruifeng commented on SPARK-30661: -- according to

[jira] [Created] (SPARK-37961) override maxRows/maxRowsPerPartition for some logical operators

2022-01-19 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-37961: Summary: override maxRows/maxRowsPerPartition for some logical operators Key: SPARK-37961 URL: https://issues.apache.org/jira/browse/SPARK-37961 Project: Spark

[jira] [Commented] (SPARK-30661) KMeans blockify input vectors

2022-01-19 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17478437#comment-17478437 ] zhengruifeng commented on SPARK-30661: -- recently, I spend some time on testing blockify kmeans and

[jira] [Created] (SPARK-37959) Fix the UT of checking norm in KMeans & BiKMeans

2022-01-19 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-37959: Summary: Fix the UT of checking norm in KMeans & BiKMeans Key: SPARK-37959 URL: https://issues.apache.org/jira/browse/SPARK-37959 Project: Spark Issue Type:

[jira] [Updated] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2021-12-31 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-37099: - Attachment: q67.png q67_optimized.png > Impl a rank-based filter to optimize

  1   2   3   4   5   6   7   8   9   10   >