[jira] [Updated] (SPARK-40441) With PANDAS_UDF, data from tasks on the same physical node is aggregated into one task execution, resulting in concurrency not being fully utilized

2022-09-14 Thread SimonAries (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SimonAries updated SPARK-40441: --- Description: {code:java} //代码占位符 import json import pandas as pd import pyspark.sql.functions as F i

[jira] [Updated] (SPARK-40441) With PANDAS_UDF, data from tasks on the same physical node is aggregated into one task execution, resulting in concurrency not being fully utilized

2022-09-14 Thread SimonAries (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SimonAries updated SPARK-40441: --- Description: {code:java} //代码占位符 import json import pandas as pd import pyspark.sql.functions as F i

[jira] [Commented] (SPARK-40441) With PANDAS_UDF, data from tasks on the same physical node is aggregated into one task execution, resulting in concurrency not being fully utilized

2022-09-14 Thread SimonAries (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605134#comment-17605134 ] SimonAries commented on SPARK-40441: I hope the big guys give me some Pointers > Wi

[jira] [Updated] (SPARK-40441) With PANDAS_UDF, data from tasks on the same physical node is aggregated into one task execution, resulting in concurrency not being fully utilized

2022-09-14 Thread SimonAries (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SimonAries updated SPARK-40441: --- Description: This caused the data skew to be very serious, and I did repartition operation before e

[jira] [Updated] (SPARK-40441) With PANDAS_UDF, data from tasks on the same physical node is aggregated into one task execution, resulting in concurrency not being fully utilized

2022-09-14 Thread SimonAries (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SimonAries updated SPARK-40441: --- Description: This caused the data skew to be very serious, and I did repartition operation before e

[jira] [Updated] (SPARK-40441) With PANDAS_UDF, data from tasks on the same physical node is aggregated into one task execution, resulting in concurrency not being fully utilized

2022-09-14 Thread SimonAries (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SimonAries updated SPARK-40441: --- Attachment: image-2022-09-15-14-29-35-004.png > With PANDAS_UDF, data from tasks on the same physica

[jira] [Updated] (SPARK-40441) With PANDAS_UDF, data from tasks on the same physical node is aggregated into one task execution, resulting in concurrency not being fully utilized

2022-09-14 Thread SimonAries (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SimonAries updated SPARK-40441: --- Attachment: image-2022-09-15-14-28-04-332.png > With PANDAS_UDF, data from tasks on the same physica

[jira] [Created] (SPARK-40441) With PANDAS_UDF, data from tasks on the same physical node is aggregated into one task execution, resulting in concurrency not being fully utilized

2022-09-14 Thread SimonAries (Jira)
SimonAries created SPARK-40441: -- Summary: With PANDAS_UDF, data from tasks on the same physical node is aggregated into one task execution, resulting in concurrency not being fully utilized Key: SPARK-40441 URL: htt

[jira] [Created] (SPARK-40440) Fix wrong reference and content in PS windows related doc

2022-09-14 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-40440: --- Summary: Fix wrong reference and content in PS windows related doc Key: SPARK-40440 URL: https://issues.apache.org/jira/browse/SPARK-40440 Project: Spark Issue

[jira] [Resolved] (SPARK-40429) Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-14 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-40429. --- Fix Version/s: 3.4.0 Assignee: Huaxin Gao Resolution: Fixed This is resolved

[jira] [Updated] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-14 Thread xsys (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xsys updated SPARK-40439: - Description: h3. Describe the bug We are trying to store a DECIMAL value {{333.22}} with more

[jira] [Updated] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-14 Thread xsys (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xsys updated SPARK-40439: - Description: h3. Describe the bug We are trying to store a DECIMAL value {{333.22}} with more

[jira] [Updated] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-14 Thread xsys (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xsys updated SPARK-40439: - Description: h3. Describe the bug We are trying to store a DECIMAL value {{333.22}} with more

[jira] [Updated] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-14 Thread xsys (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xsys updated SPARK-40439: - Description: h3. Describe the bug We are trying to store a DECIMAL value {{333.22}} with more

[jira] [Created] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-14 Thread xsys (Jira)
xsys created SPARK-40439: Summary: DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame Key: SPARK-40439 URL: https://issues.apache.org/jira

[jira] [Updated] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-14 Thread xsys (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xsys updated SPARK-40439: - Description: h3. Describe the bug We are trying to store a DECIMAL value {{333.22}} with more

[jira] [Commented] (SPARK-40435) Add test suites for applyInPandasWithState in PySpark

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605101#comment-17605101 ] Apache Spark commented on SPARK-40435: -- User 'HeartSaVioR' has created a pull reque

[jira] [Commented] (SPARK-40435) Add test suites for applyInPandasWithState in PySpark

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605100#comment-17605100 ] Apache Spark commented on SPARK-40435: -- User 'HeartSaVioR' has created a pull reque

[jira] [Assigned] (SPARK-40435) Add test suites for applyInPandasWithState in PySpark

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40435: Assignee: (was: Apache Spark) > Add test suites for applyInPandasWithState in PySpark

[jira] [Assigned] (SPARK-40435) Add test suites for applyInPandasWithState in PySpark

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40435: Assignee: Apache Spark > Add test suites for applyInPandasWithState in PySpark >

[jira] [Assigned] (SPARK-40434) Implement applyInPandasWithState in PySpark

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40434: Assignee: Apache Spark > Implement applyInPandasWithState in PySpark > --

[jira] [Assigned] (SPARK-40434) Implement applyInPandasWithState in PySpark

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40434: Assignee: (was: Apache Spark) > Implement applyInPandasWithState in PySpark > ---

[jira] [Commented] (SPARK-40434) Implement applyInPandasWithState in PySpark

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605095#comment-17605095 ] Apache Spark commented on SPARK-40434: -- User 'HeartSaVioR' has created a pull reque

[jira] [Commented] (SPARK-40437) Support string representation of durationMs in GroupState.setTimeoutDuration

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605089#comment-17605089 ] Hyukjin Kwon commented on SPARK-40437: -- [~kabhwan] I didn't add this to SPARK-40431

[jira] [Commented] (SPARK-40438) Support additionalDuration parameter in GroupState.setTimeoutTimestamp

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605088#comment-17605088 ] Hyukjin Kwon commented on SPARK-40438: -- [~kabhwan] I didn't add this to SPARK-40431

[jira] [Updated] (SPARK-40438) Support additionalDuration parameter in GroupState.setTimeoutTimestamp

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40438: - Summary: Support additionalDuration parameter in GroupState.setTimeoutTimestamp (was: Support

[jira] [Updated] (SPARK-40438) Support additionalDuration parameter in GroupState.setTimeoutTimestamp

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40438: - Description: GroupState.setTimeoutTimestamp should support additionalDuration parameter to match

[jira] [Updated] (SPARK-40438) Support in GroupState.setTimeoutTimestamp

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40438: - Summary: Support in GroupState.setTimeoutTimestamp (was: Implement additionalDuration paramete

[jira] [Updated] (SPARK-40437) Support string representation of durationMs in GroupState.setTimeoutDuration

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40437: - Summary: Support string representation of durationMs in GroupState.setTimeoutDuration (was: Sup

[jira] [Updated] (SPARK-40438) Implement additionalDuration parameter in GroupState

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40438: - Description: GroupStateImpl.additionalDuration should support string representation to match wit

[jira] [Updated] (SPARK-40438) Support additionalDuration parameter in GroupState.setTimeoutTimestamp

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40438: - Description: GroupState.setTimeoutTimestamp should support string representation to match with S

[jira] [Updated] (SPARK-40438) Implement additionalDuration parameter in GroupState

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40438: - Description: GroupStateImpl.setTimeoutDuration should support string representation to match wit

[jira] [Created] (SPARK-40438) Implement additionalDuration parameter in GroupState

2022-09-14 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-40438: Summary: Implement additionalDuration parameter in GroupState Key: SPARK-40438 URL: https://issues.apache.org/jira/browse/SPARK-40438 Project: Spark Issue Ty

[jira] [Created] (SPARK-40437) Support string representation of durationMs in GroupState

2022-09-14 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-40437: Summary: Support string representation of durationMs in GroupState Key: SPARK-40437 URL: https://issues.apache.org/jira/browse/SPARK-40437 Project: Spark Iss

[jira] [Updated] (SPARK-40437) Support string representation of durationMs in GroupState

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40437: - Priority: Minor (was: Major) > Support string representation of durationMs in GroupState >

[jira] [Commented] (SPARK-40436) Upgrade Scala to 2.12.17

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605082#comment-17605082 ] Apache Spark commented on SPARK-40436: -- User 'LuciferYang' has created a pull reque

[jira] [Assigned] (SPARK-40436) Upgrade Scala to 2.12.17

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40436: Assignee: (was: Apache Spark) > Upgrade Scala to 2.12.17 > >

[jira] [Assigned] (SPARK-40436) Upgrade Scala to 2.12.17

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40436: Assignee: Apache Spark > Upgrade Scala to 2.12.17 > > >

[jira] [Commented] (SPARK-40436) Upgrade Scala to 2.12.17

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605081#comment-17605081 ] Apache Spark commented on SPARK-40436: -- User 'LuciferYang' has created a pull reque

[jira] [Assigned] (SPARK-40433) Add toJVMRow in PythonSQLUtils to convert pickled PySpark Row to JVM Row

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40433: Assignee: (was: Apache Spark) > Add toJVMRow in PythonSQLUtils to convert pickled PyS

[jira] [Assigned] (SPARK-40433) Add toJVMRow in PythonSQLUtils to convert pickled PySpark Row to JVM Row

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40433: Assignee: Apache Spark > Add toJVMRow in PythonSQLUtils to convert pickled PySpark Row to

[jira] [Created] (SPARK-40436) Upgrade Scala to 2.12.17

2022-09-14 Thread Yang Jie (Jira)
Yang Jie created SPARK-40436: Summary: Upgrade Scala to 2.12.17 Key: SPARK-40436 URL: https://issues.apache.org/jira/browse/SPARK-40436 Project: Spark Issue Type: Improvement Components

[jira] [Commented] (SPARK-40433) Add toJVMRow in PythonSQLUtils to convert pickled PySpark Row to JVM Row

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605073#comment-17605073 ] Apache Spark commented on SPARK-40433: -- User 'HeartSaVioR' has created a pull reque

[jira] [Commented] (SPARK-40339) Implement `Expanding.quantile`.

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605072#comment-17605072 ] Apache Spark commented on SPARK-40339: -- User 'HyukjinKwon' has created a pull reque

[jira] [Commented] (SPARK-40342) Implement `Rolling.quantile`.

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605071#comment-17605071 ] Apache Spark commented on SPARK-40342: -- User 'HyukjinKwon' has created a pull reque

[jira] [Commented] (SPARK-40339) Implement `Expanding.quantile`.

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605070#comment-17605070 ] Apache Spark commented on SPARK-40339: -- User 'HyukjinKwon' has created a pull reque

[jira] [Commented] (SPARK-40432) Introduce GroupStateImpl and GroupStateTimeout in PySpark

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605068#comment-17605068 ] Apache Spark commented on SPARK-40432: -- User 'HeartSaVioR' has created a pull reque

[jira] [Commented] (SPARK-40432) Introduce GroupStateImpl and GroupStateTimeout in PySpark

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605067#comment-17605067 ] Apache Spark commented on SPARK-40432: -- User 'HeartSaVioR' has created a pull reque

[jira] [Assigned] (SPARK-40432) Introduce GroupStateImpl and GroupStateTimeout in PySpark

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40432: Assignee: (was: Apache Spark) > Introduce GroupStateImpl and GroupStateTimeout in PyS

[jira] [Assigned] (SPARK-40432) Introduce GroupStateImpl and GroupStateTimeout in PySpark

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40432: Assignee: Apache Spark > Introduce GroupStateImpl and GroupStateTimeout in PySpark >

[jira] [Created] (SPARK-40435) Add test suites for applyInPandasWithState in PySpark

2022-09-14 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-40435: Summary: Add test suites for applyInPandasWithState in PySpark Key: SPARK-40435 URL: https://issues.apache.org/jira/browse/SPARK-40435 Project: Spark Issue T

[jira] [Created] (SPARK-40434) Implement applyInPandasWithState in PySpark

2022-09-14 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-40434: Summary: Implement applyInPandasWithState in PySpark Key: SPARK-40434 URL: https://issues.apache.org/jira/browse/SPARK-40434 Project: Spark Issue Type: Sub-t

[jira] [Created] (SPARK-40433) Add toJVMRow in PythonSQLUtils to convert pickled PySpark Row to JVM Row

2022-09-14 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-40433: Summary: Add toJVMRow in PythonSQLUtils to convert pickled PySpark Row to JVM Row Key: SPARK-40433 URL: https://issues.apache.org/jira/browse/SPARK-40433 Project: Spa

[jira] [Created] (SPARK-40432) Introduce GroupStateImpl and GroupStateTimeout in PySpark

2022-09-14 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-40432: Summary: Introduce GroupStateImpl and GroupStateTimeout in PySpark Key: SPARK-40432 URL: https://issues.apache.org/jira/browse/SPARK-40432 Project: Spark Iss

[jira] [Commented] (SPARK-40431) Introduce "Arbitrary Stateful Processing" in Structured Streaming with Python

2022-09-14 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605063#comment-17605063 ] Jungtaek Lim commented on SPARK-40431: -- This is joint effort between I and [~hyukji

[jira] [Created] (SPARK-40431) Introduce "Arbitrary Stateful Processing" in Structured Streaming with Python

2022-09-14 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-40431: Summary: Introduce "Arbitrary Stateful Processing" in Structured Streaming with Python Key: SPARK-40431 URL: https://issues.apache.org/jira/browse/SPARK-40431 Project

[jira] [Resolved] (SPARK-40421) Make `spearman` correlation in `DataFrame.corr` support missing values and `min_periods`

2022-09-14 Thread Ruifeng Zheng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-40421. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37874 [https://

[jira] [Assigned] (SPARK-40421) Make `spearman` correlation in `DataFrame.corr` support missing values and `min_periods`

2022-09-14 Thread Ruifeng Zheng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-40421: - Assignee: Ruifeng Zheng > Make `spearman` correlation in `DataFrame.corr` support missi

[jira] [Updated] (SPARK-40430) Spark session does not update number of files for partition

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40430: - Component/s: SQL (was: Spark Core) > Spark session does not update number o

[jira] [Resolved] (SPARK-40426) Return a map from SparkThrowable.getMessageParameters

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40426. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37871 [https://gi

[jira] [Resolved] (SPARK-40342) Implement `Rolling.quantile`.

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40342. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37836 [https://gi

[jira] [Resolved] (SPARK-40339) Implement `Expanding.quantile`.

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40339. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37836 [https://gi

[jira] [Resolved] (SPARK-40345) Implement `ExpandingGroupby.quantile`.

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40345. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37836 [https://gi

[jira] [Assigned] (SPARK-40345) Implement `ExpandingGroupby.quantile`.

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40345: Assignee: Yikun Jiang > Implement `ExpandingGroupby.quantile`. >

[jira] [Assigned] (SPARK-40342) Implement `Rolling.quantile`.

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40342: Assignee: Yikun Jiang > Implement `Rolling.quantile`. > - > >

[jira] [Assigned] (SPARK-40348) Implement `RollingGroupby.quantile`.

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40348: Assignee: Yikun Jiang > Implement `RollingGroupby.quantile`. > --

[jira] [Resolved] (SPARK-40348) Implement `RollingGroupby.quantile`.

2022-09-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40348. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37836 [https://gi

[jira] [Resolved] (SPARK-40397) Migrate selenium-java from 3.1 to 4.2 and upgrade org.scalatestplus:selenium to 3.2.13.0

2022-09-14 Thread Kousuke Saruta (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-40397. Fix Version/s: 3.4.0 Assignee: Yang Jie Resolution: Fixed Issue resolved i

[jira] [Assigned] (SPARK-40334) Implement `GroupBy.prod`.

2022-09-14 Thread Ruifeng Zheng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-40334: - Assignee: Artsiom Yudovin (was: Haejoon Lee) > Implement `GroupBy.prod`. > ---

[jira] [Commented] (SPARK-40196) Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605013#comment-17605013 ] Apache Spark commented on SPARK-40196: -- User 'xinrong-meng' has created a pull requ

[jira] [Commented] (SPARK-40196) Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605012#comment-17605012 ] Apache Spark commented on SPARK-40196: -- User 'xinrong-meng' has created a pull requ

[jira] [Assigned] (SPARK-40196) Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40196: Assignee: (was: Apache Spark) > Consolidate `lit` function with NumPy scalar in sql a

[jira] [Assigned] (SPARK-40196) Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40196: Assignee: Apache Spark > Consolidate `lit` function with NumPy scalar in sql and pandas m

[jira] [Updated] (SPARK-40196) Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-14 Thread Xinrong Meng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-40196: - Description: Per [https://github.com/apache/spark/pull/37560#discussion_r952882996,] function `

[jira] [Updated] (SPARK-40196) Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-14 Thread Xinrong Meng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-40196: - Summary: Consolidate `lit` function with NumPy scalar in sql and pandas module (was: Consolidat

[jira] [Commented] (SPARK-40360) Convert some DDL exception to new error framework

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605008#comment-17605008 ] Apache Spark commented on SPARK-40360: -- User 'srielau' has created a pull request f

[jira] [Updated] (SPARK-40429) Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-14 Thread Huaxin Gao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-40429: --- Description: {code:java} sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)

[jira] [Updated] (SPARK-40430) Spark session does not update number of files for partition

2022-09-14 Thread Filipe Souza (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Filipe Souza updated SPARK-40430: - Attachment: session 2.png session 1.png > Spark session does not update number o

[jira] [Created] (SPARK-40430) Spark session does not update number of files for partition

2022-09-14 Thread Filipe Souza (Jira)
Filipe Souza created SPARK-40430: Summary: Spark session does not update number of files for partition Key: SPARK-40430 URL: https://issues.apache.org/jira/browse/SPARK-40430 Project: Spark

[jira] [Assigned] (SPARK-40429) Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40429: Assignee: (was: Apache Spark) > Only set KeyGroupedPartitioning when the referenced c

[jira] [Commented] (SPARK-40429) Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604961#comment-17604961 ] Apache Spark commented on SPARK-40429: -- User 'huaxingao' has created a pull request

[jira] [Assigned] (SPARK-40429) Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40429: Assignee: Apache Spark > Only set KeyGroupedPartitioning when the referenced column is in

[jira] [Created] (SPARK-40429) Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-14 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-40429: -- Summary: Only set KeyGroupedPartitioning when the referenced column is in the output Key: SPARK-40429 URL: https://issues.apache.org/jira/browse/SPARK-40429 Project: Spar

[jira] [Commented] (SPARK-40428) Add a shutdownhook to CoarseGrained scheduler to avoid dangling resources during abnormal shutdown

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604917#comment-17604917 ] Apache Spark commented on SPARK-40428: -- User 'holdenk' has created a pull request f

[jira] [Commented] (SPARK-40428) Add a shutdownhook to CoarseGrained scheduler to avoid dangling resources during abnormal shutdown

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604916#comment-17604916 ] Apache Spark commented on SPARK-40428: -- User 'holdenk' has created a pull request f

[jira] [Assigned] (SPARK-40428) Add a shutdownhook to CoarseGrained scheduler to avoid dangling resources during abnormal shutdown

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40428: Assignee: Apache Spark > Add a shutdownhook to CoarseGrained scheduler to avoid dangling

[jira] [Assigned] (SPARK-40428) Add a shutdownhook to CoarseGrained scheduler to avoid dangling resources during abnormal shutdown

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40428: Assignee: (was: Apache Spark) > Add a shutdownhook to CoarseGrained scheduler to avoi

[jira] [Created] (SPARK-40428) Add a shutdownhook to CoarseGrained scheduler to avoid dangling resources during abnormal shutdown

2022-09-14 Thread Holden Karau (Jira)
Holden Karau created SPARK-40428: Summary: Add a shutdownhook to CoarseGrained scheduler to avoid dangling resources during abnormal shutdown Key: SPARK-40428 URL: https://issues.apache.org/jira/browse/SPARK-40428

[jira] [Assigned] (SPARK-40427) Add error classes for LIMIT/OFFSET CheckAnalysis failures

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40427: Assignee: Apache Spark > Add error classes for LIMIT/OFFSET CheckAnalysis failures >

[jira] [Assigned] (SPARK-40427) Add error classes for LIMIT/OFFSET CheckAnalysis failures

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40427: Assignee: (was: Apache Spark) > Add error classes for LIMIT/OFFSET CheckAnalysis fail

[jira] [Commented] (SPARK-40427) Add error classes for LIMIT/OFFSET CheckAnalysis failures

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604883#comment-17604883 ] Apache Spark commented on SPARK-40427: -- User 'dtenedor' has created a pull request

[jira] [Created] (SPARK-40427) Add error classes for LIMIT/OFFSET CheckAnalysis failures

2022-09-14 Thread Daniel (Jira)
Daniel created SPARK-40427: -- Summary: Add error classes for LIMIT/OFFSET CheckAnalysis failures Key: SPARK-40427 URL: https://issues.apache.org/jira/browse/SPARK-40427 Project: Spark Issue Type: Sub

[jira] [Commented] (SPARK-38017) Fix the API doc for window to say it supports TimestampNTZType too as timeColumn

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604859#comment-17604859 ] Apache Spark commented on SPARK-38017: -- User 'sarutak' has created a pull request f

[jira] [Commented] (SPARK-38017) Fix the API doc for window to say it supports TimestampNTZType too as timeColumn

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604858#comment-17604858 ] Apache Spark commented on SPARK-38017: -- User 'sarutak' has created a pull request f

[jira] [Commented] (SPARK-38017) Fix the API doc for window to say it supports TimestampNTZType too as timeColumn

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604857#comment-17604857 ] Apache Spark commented on SPARK-38017: -- User 'sarutak' has created a pull request f

[jira] [Assigned] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40169: Assignee: Apache Spark > Fix the issue with Parquet column index and predicate pushdown i

[jira] [Commented] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604855#comment-17604855 ] Apache Spark commented on SPARK-40169: -- User 'sunchao' has created a pull request f

[jira] [Assigned] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-09-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40169: Assignee: (was: Apache Spark) > Fix the issue with Parquet column index and predicate

[jira] [Commented] (SPARK-40334) Implement `GroupBy.prod`.

2022-09-14 Thread Artsiom Yudovin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604841#comment-17604841 ] Artsiom Yudovin commented on SPARK-40334: - Got you, thank you so much! > Implem

[jira] [Updated] (SPARK-40423) Add explicit YuniKorn queue submission test coverage

2022-09-14 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40423: -- Fix Version/s: 3.3.2 (was: 3.3.1) > Add explicit YuniKorn queue submiss

  1   2   >