[jira] [Assigned] (SPARK-41648) Deduplicate docstrings in pyspark.sql.connect.readwriter
[ https://issues.apache.org/jira/browse/SPARK-41648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41648: Assignee: Apache Spark > Deduplicate docstrings in pyspark.sql.connect.readwriter > > > Key: SPARK-41648 > URL: https://issues.apache.org/jira/browse/SPARK-41648 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41648) Deduplicate docstrings in pyspark.sql.connect.readwriter
[ https://issues.apache.org/jira/browse/SPARK-41648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41648: Assignee: (was: Apache Spark) > Deduplicate docstrings in pyspark.sql.connect.readwriter > > > Key: SPARK-41648 > URL: https://issues.apache.org/jira/browse/SPARK-41648 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41648) Deduplicate docstrings in pyspark.sql.connect.readwriter
[ https://issues.apache.org/jira/browse/SPARK-41648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650178#comment-17650178 ] Apache Spark commented on SPARK-41648: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39153 > Deduplicate docstrings in pyspark.sql.connect.readwriter > > > Key: SPARK-41648 > URL: https://issues.apache.org/jira/browse/SPARK-41648 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41648) Deduplicate docstrings in pyspark.sql.connect.readwriter
[ https://issues.apache.org/jira/browse/SPARK-41648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650174#comment-17650174 ] Apache Spark commented on SPARK-41648: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39153 > Deduplicate docstrings in pyspark.sql.connect.readwriter > > > Key: SPARK-41648 > URL: https://issues.apache.org/jira/browse/SPARK-41648 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41660) only propagate metadata columns if they are used
[ https://issues.apache.org/jira/browse/SPARK-41660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650134#comment-17650134 ] Apache Spark commented on SPARK-41660: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/39152 > only propagate metadata columns if they are used > > > Key: SPARK-41660 > URL: https://issues.apache.org/jira/browse/SPARK-41660 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41660) only propagate metadata columns if they are used
[ https://issues.apache.org/jira/browse/SPARK-41660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41660: Assignee: (was: Apache Spark) > only propagate metadata columns if they are used > > > Key: SPARK-41660 > URL: https://issues.apache.org/jira/browse/SPARK-41660 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41660) only propagate metadata columns if they are used
[ https://issues.apache.org/jira/browse/SPARK-41660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41660: Assignee: Apache Spark > only propagate metadata columns if they are used > > > Key: SPARK-41660 > URL: https://issues.apache.org/jira/browse/SPARK-41660 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41660) only propagate metadata columns if they are used
Wenchen Fan created SPARK-41660: --- Summary: only propagate metadata columns if they are used Key: SPARK-41660 URL: https://issues.apache.org/jira/browse/SPARK-41660 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41653) Test parity: enable doctests in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41653: Assignee: Hyukjin Kwon > Test parity: enable doctests in Spark Connect > - > > Key: SPARK-41653 > URL: https://issues.apache.org/jira/browse/SPARK-41653 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > We should actually run the doctests of Spark Connect. > We should add something like > https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247 > to Spark Connect modules, and add the module into > https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe
[ https://issues.apache.org/jira/browse/SPARK-41651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41651: Assignee: Hyukjin Kwon > Test parity: pyspark.sql.tests.test_dataframe > - > > Key: SPARK-41651 > URL: https://issues.apache.org/jira/browse/SPARK-41651 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses > the same test cases, see > {{python/pyspark/sql/tests/connect/test_parity_dataframe.py}}. > We should remove all the test cases defined there, and fix Spark Connect > behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions
[ https://issues.apache.org/jira/browse/SPARK-41652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41652: Assignee: Hyukjin Kwon > Test parity: pyspark.sql.tests.test_functions > - > > Key: SPARK-41652 > URL: https://issues.apache.org/jira/browse/SPARK-41652 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses > the same test cases, see > {{python/pyspark/sql/tests/connect/test_parity_functions.py}}. > We should remove all the test cases defined there, and fix Spark Connect > behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41653) Test parity: enable doctests in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650122#comment-17650122 ] Hyukjin Kwon commented on SPARK-41653: -- cc jiaan.geng and Deng Ziming in case you guys are interested in this. > Test parity: enable doctests in Spark Connect > - > > Key: SPARK-41653 > URL: https://issues.apache.org/jira/browse/SPARK-41653 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > We should actually run the doctests of Spark Connect. > We should add something like > https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247 > to Spark Connect modules, and add the module into > https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41653) Test parity: enable doctests in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650121#comment-17650121 ] Hyukjin Kwon commented on SPARK-41653: -- If the files is too big, feel free to split the JIRA or make a multiple followups. (e.g., pyspark.sql.connect.functions) > Test parity: enable doctests in Spark Connect > - > > Key: SPARK-41653 > URL: https://issues.apache.org/jira/browse/SPARK-41653 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > We should actually run the doctests of Spark Connect. > We should add something like > https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247 > to Spark Connect modules, and add the module into > https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41659) Enable doctests in pyspark.sql.connect.readwriter
Hyukjin Kwon created SPARK-41659: Summary: Enable doctests in pyspark.sql.connect.readwriter Key: SPARK-41659 URL: https://issues.apache.org/jira/browse/SPARK-41659 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41657) Enable doctests in pyspark.sql.connect.session
Hyukjin Kwon created SPARK-41657: Summary: Enable doctests in pyspark.sql.connect.session Key: SPARK-41657 URL: https://issues.apache.org/jira/browse/SPARK-41657 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions
Hyukjin Kwon created SPARK-41658: Summary: Enable doctests in pyspark.sql.connect.functions Key: SPARK-41658 URL: https://issues.apache.org/jira/browse/SPARK-41658 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41655) Enable doctests in pyspark.sql.connect.column
Hyukjin Kwon created SPARK-41655: Summary: Enable doctests in pyspark.sql.connect.column Key: SPARK-41655 URL: https://issues.apache.org/jira/browse/SPARK-41655 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41654) Enable doctests in pyspark.sql.connect.window
Hyukjin Kwon created SPARK-41654: Summary: Enable doctests in pyspark.sql.connect.window Key: SPARK-41654 URL: https://issues.apache.org/jira/browse/SPARK-41654 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe
Hyukjin Kwon created SPARK-41656: Summary: Enable doctests in pyspark.sql.connect.dataframe Key: SPARK-41656 URL: https://issues.apache.org/jira/browse/SPARK-41656 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41653) Test parity: enable doctests in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41653: - Epic Link: SPARK-39375 Issue Type: Umbrella (was: Bug) > Test parity: enable doctests in Spark Connect > - > > Key: SPARK-41653 > URL: https://issues.apache.org/jira/browse/SPARK-41653 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > We should actually run the doctests of Spark Connect. > We should add something like > https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247 > to Spark Connect modules, and add the module into > https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41653) Test parity: enable doctests in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41653: - Parent: (was: SPARK-39375) Issue Type: Bug (was: Sub-task) > Test parity: enable doctests in Spark Connect > - > > Key: SPARK-41653 > URL: https://issues.apache.org/jira/browse/SPARK-41653 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > We should actually run the doctests of Spark Connect. > We should add something like > https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247 > to Spark Connect modules, and add the module into > https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41653) Test parity: enable doctests in Spark Connect
Hyukjin Kwon created SPARK-41653: Summary: Test parity: enable doctests in Spark Connect Key: SPARK-41653 URL: https://issues.apache.org/jira/browse/SPARK-41653 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon We should actually run the doctests of Spark Connect. We should add something like https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247 to Spark Connect modules, and add the module into https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe
[ https://issues.apache.org/jira/browse/SPARK-41651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41651: - Description: After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses the same test cases, see {{python/pyspark/sql/tests/connect/test_parity_dataframe.py}}. We should remove all the test cases defined there, and fix Spark Connect behaviours accordingly. was: After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses the same test cases, see `python/pyspark/sql/tests/connect/test_parity_dataframe.py`. We should remove all the test cases defined there, and fix Spark Connect behaviours accordingly. > Test parity: pyspark.sql.tests.test_dataframe > - > > Key: SPARK-41651 > URL: https://issues.apache.org/jira/browse/SPARK-41651 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses > the same test cases, see > {{python/pyspark/sql/tests/connect/test_parity_dataframe.py}}. > We should remove all the test cases defined there, and fix Spark Connect > behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions
[ https://issues.apache.org/jira/browse/SPARK-41652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41652: - Description: After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses the same test cases, see {{python/pyspark/sql/tests/connect/test_parity_functions.py}}. We should remove all the test cases defined there, and fix Spark Connect behaviours accordingly. was: After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses the same test cases, see `python/pyspark/sql/tests/connect/test_parity_functions.py`. We should remove all the test cases defined there, and fix Spark Connect behaviours accordingly. > Test parity: pyspark.sql.tests.test_functions > - > > Key: SPARK-41652 > URL: https://issues.apache.org/jira/browse/SPARK-41652 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses > the same test cases, see > {{python/pyspark/sql/tests/connect/test_parity_functions.py}}. > We should remove all the test cases defined there, and fix Spark Connect > behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe
[ https://issues.apache.org/jira/browse/SPARK-41651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650118#comment-17650118 ] Hyukjin Kwon commented on SPARK-41651: -- cc [~beliefer] and [~dengziming] in case you guys are interested in this. > Test parity: pyspark.sql.tests.test_dataframe > - > > Key: SPARK-41651 > URL: https://issues.apache.org/jira/browse/SPARK-41651 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses > the same test cases, see > `python/pyspark/sql/tests/connect/test_parity_dataframe.py`. > We should remove all the test cases defined there, and fix Spark Connect > behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions
[ https://issues.apache.org/jira/browse/SPARK-41652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650119#comment-17650119 ] Hyukjin Kwon commented on SPARK-41652: -- cc [~beliefer] and [~dengziming] in case you guys are interested in this. > Test parity: pyspark.sql.tests.test_functions > - > > Key: SPARK-41652 > URL: https://issues.apache.org/jira/browse/SPARK-41652 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses > the same test cases, see > `python/pyspark/sql/tests/connect/test_parity_functions.py`. > We should remove all the test cases defined there, and fix Spark Connect > behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41642) Deduplicate docstrings in Python Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650117#comment-17650117 ] Hyukjin Kwon commented on SPARK-41642: -- cc [~beliefer] and [~dengziming] in case you guys are interested in this. > Deduplicate docstrings in Python Spark Connect > -- > > Key: SPARK-41642 > URL: https://issues.apache.org/jira/browse/SPARK-41642 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > There are a lot of duplications in the current docstrings in PySpark Spark > Connect API side. > We should deduplicate them all. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions
[ https://issues.apache.org/jira/browse/SPARK-41652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41652: - Epic Link: SPARK-39375 > Test parity: pyspark.sql.tests.test_functions > - > > Key: SPARK-41652 > URL: https://issues.apache.org/jira/browse/SPARK-41652 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses > the same test cases, see > `python/pyspark/sql/tests/connect/test_parity_functions.py`. > We should remove all the test cases defined there, and fix Spark Connect > behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions
Hyukjin Kwon created SPARK-41652: Summary: Test parity: pyspark.sql.tests.test_functions Key: SPARK-41652 URL: https://issues.apache.org/jira/browse/SPARK-41652 Project: Spark Issue Type: Umbrella Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses the same test cases, see `python/pyspark/sql/tests/connect/test_parity_functions.py`. We should remove all the test cases defined there, and fix Spark Connect behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions
[ https://issues.apache.org/jira/browse/SPARK-41652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650115#comment-17650115 ] Hyukjin Kwon commented on SPARK-41652: -- Please create a subtask and go ahead. > Test parity: pyspark.sql.tests.test_functions > - > > Key: SPARK-41652 > URL: https://issues.apache.org/jira/browse/SPARK-41652 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses > the same test cases, see > `python/pyspark/sql/tests/connect/test_parity_functions.py`. > We should remove all the test cases defined there, and fix Spark Connect > behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41650) json expressions much slower in optimized mode
[ https://issues.apache.org/jira/browse/SPARK-41650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650114#comment-17650114 ] Yi Zhang commented on SPARK-41650: -- [~gurwls223] , [~viirya] can you help look into this? > json expressions much slower in optimized mode > -- > > Key: SPARK-41650 > URL: https://issues.apache.org/jira/browse/SPARK-41650 > Project: Spark > Issue Type: Bug > Components: Spark Core, Structured Streaming >Affects Versions: 3.2.2 >Reporter: Yi Zhang >Priority: Major > > I noticed spark structured streaming reading from Kafka json string into > struct type is much slower in spark-3.1+ than spark-3.0. Profiling reveals > the json expressions in spark-3.0 mostly on evaluate subExpr, while > spark-3.1/3.2 spent a lot time on writeField. > Suspect this may be related to SPARK-32948, so I tried with add a bogus > option > from_json($"value", mySchema, Map("bogus_key"-> "bogus_value") > this turns off the optimization and the performance is much better. For > reference, > for same amount #records, it is 30 seconds vs. 3 minute on a task processing > 500k records. This is big difference for a streaming job. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe
[ https://issues.apache.org/jira/browse/SPARK-41651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41651: - Issue Type: Umbrella (was: Improvement) > Test parity: pyspark.sql.tests.test_dataframe > - > > Key: SPARK-41651 > URL: https://issues.apache.org/jira/browse/SPARK-41651 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses > the same test cases, see > `python/pyspark/sql/tests/connect/test_parity_dataframe.py`. > We should remove all the test cases defined there, and fix Spark Connect > behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe
[ https://issues.apache.org/jira/browse/SPARK-41651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650112#comment-17650112 ] Hyukjin Kwon commented on SPARK-41651: -- Please create a subtask and work on it. > Test parity: pyspark.sql.tests.test_dataframe > - > > Key: SPARK-41651 > URL: https://issues.apache.org/jira/browse/SPARK-41651 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses > the same test cases, see > `python/pyspark/sql/tests/connect/test_parity_dataframe.py`. > We should remove all the test cases defined there, and fix Spark Connect > behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe
Hyukjin Kwon created SPARK-41651: Summary: Test parity: pyspark.sql.tests.test_dataframe Key: SPARK-41651 URL: https://issues.apache.org/jira/browse/SPARK-41651 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses the same test cases, see `python/pyspark/sql/tests/connect/test_parity_dataframe.py`. We should remove all the test cases defined there, and fix Spark Connect behaviours accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41650) json expressions much slower in optimized mode
Yi Zhang created SPARK-41650: Summary: json expressions much slower in optimized mode Key: SPARK-41650 URL: https://issues.apache.org/jira/browse/SPARK-41650 Project: Spark Issue Type: Bug Components: Spark Core, Structured Streaming Affects Versions: 3.2.2 Reporter: Yi Zhang I noticed spark structured streaming reading from Kafka json string into struct type is much slower in spark-3.1+ than spark-3.0. Profiling reveals the json expressions in spark-3.0 mostly on evaluate subExpr, while spark-3.1/3.2 spent a lot time on writeField. Suspect this may be related to SPARK-32948, so I tried with add a bogus option from_json($"value", mySchema, Map("bogus_key"-> "bogus_value") this turns off the optimization and the performance is much better. For reference, for same amount #records, it is 30 seconds vs. 3 minute on a task processing 500k records. This is big difference for a streaming job. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41649) Deduplicate docstrings in pyspark.sql.connect.window
Hyukjin Kwon created SPARK-41649: Summary: Deduplicate docstrings in pyspark.sql.connect.window Key: SPARK-41649 URL: https://issues.apache.org/jira/browse/SPARK-41649 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41642) Deduplicate docstrings in Python Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650111#comment-17650111 ] Hyukjin Kwon commented on SPARK-41642: -- If the files is too big, feel free to split the JIRA or make a multiple followups. (e.g., pyspark.sql.connect.functions) > Deduplicate docstrings in Python Spark Connect > -- > > Key: SPARK-41642 > URL: https://issues.apache.org/jira/browse/SPARK-41642 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > There are a lot of duplications in the current docstrings in PySpark Spark > Connect API side. > We should deduplicate them all. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41648) Deduplicate docstrings in pyspark.sql.connect.readwriter
Hyukjin Kwon created SPARK-41648: Summary: Deduplicate docstrings in pyspark.sql.connect.readwriter Key: SPARK-41648 URL: https://issues.apache.org/jira/browse/SPARK-41648 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41646) Deduplicate docstrings in pyspark.sql.connect.session
Hyukjin Kwon created SPARK-41646: Summary: Deduplicate docstrings in pyspark.sql.connect.session Key: SPARK-41646 URL: https://issues.apache.org/jira/browse/SPARK-41646 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41647) Deduplicate docstrings in pyspark.sql.connect.functions
Hyukjin Kwon created SPARK-41647: Summary: Deduplicate docstrings in pyspark.sql.connect.functions Key: SPARK-41647 URL: https://issues.apache.org/jira/browse/SPARK-41647 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41645) Deduplicate docstrings in pyspark.sql.connect.dataframe
Hyukjin Kwon created SPARK-41645: Summary: Deduplicate docstrings in pyspark.sql.connect.dataframe Key: SPARK-41645 URL: https://issues.apache.org/jira/browse/SPARK-41645 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column
[ https://issues.apache.org/jira/browse/SPARK-41643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650108#comment-17650108 ] Apache Spark commented on SPARK-41643: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39150 > Deduplicate docstrings in pyspark.sql.connect.column > > > Key: SPARK-41643 > URL: https://issues.apache.org/jira/browse/SPARK-41643 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column
[ https://issues.apache.org/jira/browse/SPARK-41643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41643: Assignee: Apache Spark > Deduplicate docstrings in pyspark.sql.connect.column > > > Key: SPARK-41643 > URL: https://issues.apache.org/jira/browse/SPARK-41643 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column
[ https://issues.apache.org/jira/browse/SPARK-41643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41643: Assignee: (was: Apache Spark) > Deduplicate docstrings in pyspark.sql.connect.column > > > Key: SPARK-41643 > URL: https://issues.apache.org/jira/browse/SPARK-41643 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41644) Introducing SPI mechanism to make it easy for other modules to register ProtoBufSerializer
[ https://issues.apache.org/jira/browse/SPARK-41644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41644: Assignee: (was: Apache Spark) > Introducing SPI mechanism to make it easy for other modules to register > ProtoBufSerializer > -- > > Key: SPARK-41644 > URL: https://issues.apache.org/jira/browse/SPARK-41644 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41644) Introducing SPI mechanism to make it easy for other modules to register ProtoBufSerializer
[ https://issues.apache.org/jira/browse/SPARK-41644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41644: Assignee: Apache Spark > Introducing SPI mechanism to make it easy for other modules to register > ProtoBufSerializer > -- > > Key: SPARK-41644 > URL: https://issues.apache.org/jira/browse/SPARK-41644 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41644) Introducing SPI mechanism to make it easy for other modules to register ProtoBufSerializer
[ https://issues.apache.org/jira/browse/SPARK-41644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650100#comment-17650100 ] Apache Spark commented on SPARK-41644: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39148 > Introducing SPI mechanism to make it easy for other modules to register > ProtoBufSerializer > -- > > Key: SPARK-41644 > URL: https://issues.apache.org/jira/browse/SPARK-41644 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41644) Introducing SPI mechanism to make it easy for other modules to register ProtoBufSerializer
Yang Jie created SPARK-41644: Summary: Introducing SPI mechanism to make it easy for other modules to register ProtoBufSerializer Key: SPARK-41644 URL: https://issues.apache.org/jira/browse/SPARK-41644 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41597) Improve PySpark errors
[ https://issues.apache.org/jira/browse/SPARK-41597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-41597: Component/s: Connect Tests > Improve PySpark errors > -- > > Key: SPARK-41597 > URL: https://issues.apache.org/jira/browse/SPARK-41597 > Project: Spark > Issue Type: Umbrella > Components: Connect, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > This ticket aims to introduce new PySpark framework to centralize the PySpark > error message into single path and improve the error message more actionable > and consistency. > This umbrella JIRA might includes: > * Introduce new error framework for PySpark > * Migrate existing errors generated by Python driver into error classes. > * Migrate existing errors generated by Python worker into error classes. > * Migrate existing errors generated by Py4J into error classes. > * Introduce test utils for testing errors by its error class instead of > error messages. > * Improve the error messages. > * Documentation for PySpark error framework. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41539) stats and constraints in LogicalRDD may not be in sync with output attributes
[ https://issues.apache.org/jira/browse/SPARK-41539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-41539. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39082 [https://github.com/apache/spark/pull/39082] > stats and constraints in LogicalRDD may not be in sync with output attributes > - > > Key: SPARK-41539 > URL: https://issues.apache.org/jira/browse/SPARK-41539 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.4.0 > > > We encountered the case where the output of logical plan and optimized plan > were different in LogicalRDD (the difference was exprId for the case), led > the situation that stats and constraints are out of sync with output > attributes, eventually failed the query. > We should remap stats and constraints based on the output of logical plan, > assuming that the output of logical plan and optimized plan are "slightly" > different (e.g. exprId) but "semantically" same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41539) stats and constraints in LogicalRDD may not be in sync with output attributes
[ https://issues.apache.org/jira/browse/SPARK-41539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-41539: Assignee: Jungtaek Lim > stats and constraints in LogicalRDD may not be in sync with output attributes > - > > Key: SPARK-41539 > URL: https://issues.apache.org/jira/browse/SPARK-41539 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > We encountered the case where the output of logical plan and optimized plan > were different in LogicalRDD (the difference was exprId for the case), led > the situation that stats and constraints are out of sync with output > attributes, eventually failed the query. > We should remap stats and constraints based on the output of logical plan, > assuming that the output of logical plan and optimized plan are "slightly" > different (e.g. exprId) but "semantically" same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41634) Upgrade minimatch to 3.1.2
[ https://issues.apache.org/jira/browse/SPARK-41634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-41634. Fix Version/s: 3.4.0 Assignee: Bjørn Jørgensen Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/39143 > Upgrade minimatch to 3.1.2 > --- > > Key: SPARK-41634 > URL: https://issues.apache.org/jira/browse/SPARK-41634 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Minor > Fix For: 3.4.0 > > > [CVE-2022-3517|https://nvd.nist.gov/vuln/detail/CVE-2022-3517] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41587) Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7
[ https://issues.apache.org/jira/browse/SPARK-41587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-41587. Fix Version/s: 3.4.0 Assignee: Yang Jie Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/39129 > Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7 > > > Key: SPARK-41587 > URL: https://issues.apache.org/jira/browse/SPARK-41587 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.0 > > > https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.14.0-for-selenium-4.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41292) Window-function support
[ https://issues.apache.org/jira/browse/SPARK-41292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650073#comment-17650073 ] Apache Spark commented on SPARK-41292: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39149 > Window-function support > --- > > Key: SPARK-41292 > URL: https://issues.apache.org/jira/browse/SPARK-41292 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Blocker > > For compatibility, we need support for expressing window functions. Window > functions are different from regular unresolved expressions as they need a > window spec and are generally treated more like aggregate functions. > Part of this task is to identify if we can fully express the logic of window > functions using unresolved functions with expression arguments that represent > the window spec. > Only once this validation is done, we should consider adding a new plan > operator / expression type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41292) Window-function support
[ https://issues.apache.org/jira/browse/SPARK-41292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41292: Assignee: (was: Apache Spark) > Window-function support > --- > > Key: SPARK-41292 > URL: https://issues.apache.org/jira/browse/SPARK-41292 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Blocker > > For compatibility, we need support for expressing window functions. Window > functions are different from regular unresolved expressions as they need a > window spec and are generally treated more like aggregate functions. > Part of this task is to identify if we can fully express the logic of window > functions using unresolved functions with expression arguments that represent > the window spec. > Only once this validation is done, we should consider adding a new plan > operator / expression type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41292) Window-function support
[ https://issues.apache.org/jira/browse/SPARK-41292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41292: Assignee: Apache Spark > Window-function support > --- > > Key: SPARK-41292 > URL: https://issues.apache.org/jira/browse/SPARK-41292 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Apache Spark >Priority: Blocker > > For compatibility, we need support for expressing window functions. Window > functions are different from regular unresolved expressions as they need a > window spec and are generally treated more like aggregate functions. > Part of this task is to identify if we can fully express the logic of window > functions using unresolved functions with expression arguments that represent > the window spec. > Only once this validation is done, we should consider adding a new plan > operator / expression type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column
Hyukjin Kwon created SPARK-41643: Summary: Deduplicate docstrings in pyspark.sql.connect.column Key: SPARK-41643 URL: https://issues.apache.org/jira/browse/SPARK-41643 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column
[ https://issues.apache.org/jira/browse/SPARK-41643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650071#comment-17650071 ] Hyukjin Kwon commented on SPARK-41643: -- I am working on this > Deduplicate docstrings in pyspark.sql.connect.column > > > Key: SPARK-41643 > URL: https://issues.apache.org/jira/browse/SPARK-41643 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41642) Deduplicate docstrings in Python Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41642: Assignee: Hyukjin Kwon > Deduplicate docstrings in Python Spark Connect > -- > > Key: SPARK-41642 > URL: https://issues.apache.org/jira/browse/SPARK-41642 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > There are a lot of duplications in the current docstrings in PySpark Spark > Connect API side. > We should deduplicate them all. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41642) Deduplicate docstrings in Python Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41642: - Description: There are a lot of duplications in the current docstrings in PySpark Spark Connect API side. We should deduplicate them all. > Deduplicate docstrings in Python Spark Connect > -- > > Key: SPARK-41642 > URL: https://issues.apache.org/jira/browse/SPARK-41642 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > There are a lot of duplications in the current docstrings in PySpark Spark > Connect API side. > We should deduplicate them all. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41642) Deduplicate docstrings in Python Spark Connect
Hyukjin Kwon created SPARK-41642: Summary: Deduplicate docstrings in Python Spark Connect Key: SPARK-41642 URL: https://issues.apache.org/jira/browse/SPARK-41642 Project: Spark Issue Type: Umbrella Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41426) Protobuf serializer for ResourceProfileWrapper
[ https://issues.apache.org/jira/browse/SPARK-41426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-41426. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39105 [https://github.com/apache/spark/pull/39105] > Protobuf serializer for ResourceProfileWrapper > -- > > Key: SPARK-41426 > URL: https://issues.apache.org/jira/browse/SPARK-41426 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41426) Protobuf serializer for ResourceProfileWrapper
[ https://issues.apache.org/jira/browse/SPARK-41426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-41426: -- Assignee: Sandeep Singh > Protobuf serializer for ResourceProfileWrapper > -- > > Key: SPARK-41426 > URL: https://issues.apache.org/jira/browse/SPARK-41426 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Sandeep Singh >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41434) Support LambdaFunction expresssion
[ https://issues.apache.org/jira/browse/SPARK-41434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41434. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39068 [https://github.com/apache/spark/pull/39068] > Support LambdaFunction expresssion > -- > > Key: SPARK-41434 > URL: https://issues.apache.org/jira/browse/SPARK-41434 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41434) Support LambdaFunction expresssion
[ https://issues.apache.org/jira/browse/SPARK-41434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41434: Assignee: Ruifeng Zheng > Support LambdaFunction expresssion > -- > > Key: SPARK-41434 > URL: https://issues.apache.org/jira/browse/SPARK-41434 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41192) Task finished before speculative task scheduled leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-41192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-41192: --- Assignee: Yazhi Wang > Task finished before speculative task scheduled leads to holding idle > executors > --- > > Key: SPARK-41192 > URL: https://issues.apache.org/jira/browse/SPARK-41192 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2, 3.3.1 >Reporter: Yazhi Wang >Assignee: Yazhi Wang >Priority: Minor > Labels: dynamic_allocation > Attachments: dynamic-executors, dynamic-log > > > When task finished before speculative task has been scheduled by > DAGScheduler, then the speculative tasks will be considered as pending and > count towards the calculation of number of needed executors, which will lead > to request more executors than needed > h2. Background & Reproduce > In one of our production job, we found that ExecutorAllocationManager was > holding more executors than needed. > We found it's difficult to reproduce in the test environment. In order to > stably reproduce and debug, we temporarily annotated the scheduling code of > speculative tasks in TaskSetManager:363 to ensure that the task be completed > before the speculative task being scheduled. > {code:java} > // Original code > private def dequeueTask( > execId: String, > host: String, > maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, > Boolean)] = { > // Tries to schedule a regular task first; if it returns None, then > schedules > // a speculative task > dequeueTaskHelper(execId, host, maxLocality, false).orElse( > dequeueTaskHelper(execId, host, maxLocality, true)) > } > // Speculative task will never be scheduled > private def dequeueTask( > execId: String, > host: String, > maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, > Boolean)] = { > // Tries to schedule a regular task first; if it returns None, then > schedules > // a speculative task > dequeueTaskHelper(execId, host, maxLocality, false) > } {code} > Referring to examples in SPARK-30511 > You will see when running the last task, we would be hold 38 executors (see > attachment), which is exactly (149 + 1) / 4 = 38. But actually there are only > 2 tasks in running, which requires Math.min(20, 2/4) = 20 executors indeed. > {code:java} > ./bin/spark-shell --master yarn --conf spark.speculation=true --conf > spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf > spark.dynamicAllocation.minExecutors=20 --conf > spark.dynamicAllocation.maxExecutors=1000 {code} > {code:java} > val n = 4000 > val someRDD = sc.parallelize(1 to n, n) > someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { > if (index > 3998) { > Thread.sleep(1000 * 1000) > } else if (index > 3850) { > Thread.sleep(50 * 1000) // Fake running tasks > } else { > Thread.sleep(100) > } > Array.fill[Int](1)(1).iterator{code} > > I will have a PR ready to fix this issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41192) Task finished before speculative task scheduled leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-41192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-41192. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38711 [https://github.com/apache/spark/pull/38711] > Task finished before speculative task scheduled leads to holding idle > executors > --- > > Key: SPARK-41192 > URL: https://issues.apache.org/jira/browse/SPARK-41192 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2, 3.3.1 >Reporter: Yazhi Wang >Assignee: Yazhi Wang >Priority: Minor > Labels: dynamic_allocation > Fix For: 3.4.0 > > Attachments: dynamic-executors, dynamic-log > > > When task finished before speculative task has been scheduled by > DAGScheduler, then the speculative tasks will be considered as pending and > count towards the calculation of number of needed executors, which will lead > to request more executors than needed > h2. Background & Reproduce > In one of our production job, we found that ExecutorAllocationManager was > holding more executors than needed. > We found it's difficult to reproduce in the test environment. In order to > stably reproduce and debug, we temporarily annotated the scheduling code of > speculative tasks in TaskSetManager:363 to ensure that the task be completed > before the speculative task being scheduled. > {code:java} > // Original code > private def dequeueTask( > execId: String, > host: String, > maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, > Boolean)] = { > // Tries to schedule a regular task first; if it returns None, then > schedules > // a speculative task > dequeueTaskHelper(execId, host, maxLocality, false).orElse( > dequeueTaskHelper(execId, host, maxLocality, true)) > } > // Speculative task will never be scheduled > private def dequeueTask( > execId: String, > host: String, > maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, > Boolean)] = { > // Tries to schedule a regular task first; if it returns None, then > schedules > // a speculative task > dequeueTaskHelper(execId, host, maxLocality, false) > } {code} > Referring to examples in SPARK-30511 > You will see when running the last task, we would be hold 38 executors (see > attachment), which is exactly (149 + 1) / 4 = 38. But actually there are only > 2 tasks in running, which requires Math.min(20, 2/4) = 20 executors indeed. > {code:java} > ./bin/spark-shell --master yarn --conf spark.speculation=true --conf > spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf > spark.dynamicAllocation.minExecutors=20 --conf > spark.dynamicAllocation.maxExecutors=1000 {code} > {code:java} > val n = 4000 > val someRDD = sc.parallelize(1 to n, n) > someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { > if (index > 3998) { > Thread.sleep(1000 * 1000) > } else if (index > 3850) { > Thread.sleep(50 * 1000) // Fake running tasks > } else { > Thread.sleep(100) > } > Array.fill[Int](1)(1).iterator{code} > > I will have a PR ready to fix this issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41641) Implement `Column.over`
[ https://issues.apache.org/jira/browse/SPARK-41641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650065#comment-17650065 ] Apache Spark commented on SPARK-41641: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39149 > Implement `Column.over` > --- > > Key: SPARK-41641 > URL: https://issues.apache.org/jira/browse/SPARK-41641 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41641) Implement `Column.over`
[ https://issues.apache.org/jira/browse/SPARK-41641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41641: Assignee: (was: Apache Spark) > Implement `Column.over` > --- > > Key: SPARK-41641 > URL: https://issues.apache.org/jira/browse/SPARK-41641 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41641) Implement `Column.over`
[ https://issues.apache.org/jira/browse/SPARK-41641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41641: Assignee: Apache Spark > Implement `Column.over` > --- > > Key: SPARK-41641 > URL: https://issues.apache.org/jira/browse/SPARK-41641 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41640) implement `Window` functions
[ https://issues.apache.org/jira/browse/SPARK-41640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41640: Assignee: (was: Apache Spark) > implement `Window` functions > > > Key: SPARK-41640 > URL: https://issues.apache.org/jira/browse/SPARK-41640 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41640) implement `Window` functions
[ https://issues.apache.org/jira/browse/SPARK-41640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650064#comment-17650064 ] Apache Spark commented on SPARK-41640: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39149 > implement `Window` functions > > > Key: SPARK-41640 > URL: https://issues.apache.org/jira/browse/SPARK-41640 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41640) implement `Window` functions
[ https://issues.apache.org/jira/browse/SPARK-41640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41640: Assignee: Apache Spark > implement `Window` functions > > > Key: SPARK-41640 > URL: https://issues.apache.org/jira/browse/SPARK-41640 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41641) Implement `Column.over`
Ruifeng Zheng created SPARK-41641: - Summary: Implement `Column.over` Key: SPARK-41641 URL: https://issues.apache.org/jira/browse/SPARK-41641 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41640) implement `Window` functions
Ruifeng Zheng created SPARK-41640: - Summary: implement `Window` functions Key: SPARK-41640 URL: https://issues.apache.org/jira/browse/SPARK-41640 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41631) Support lateral column alias in Aggregate code path
[ https://issues.apache.org/jira/browse/SPARK-41631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41631. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39040 [https://github.com/apache/spark/pull/39040] > Support lateral column alias in Aggregate code path > --- > > Key: SPARK-41631 > URL: https://issues.apache.org/jira/browse/SPARK-41631 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Xinyi Yu >Assignee: Xinyi Yu >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41631) Support lateral column alias in Aggregate code path
[ https://issues.apache.org/jira/browse/SPARK-41631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-41631: --- Assignee: Xinyi Yu > Support lateral column alias in Aggregate code path > --- > > Key: SPARK-41631 > URL: https://issues.apache.org/jira/browse/SPARK-41631 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Xinyi Yu >Assignee: Xinyi Yu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41639) Remove ScalaReflectionLock
[ https://issues.apache.org/jira/browse/SPARK-41639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41639: Assignee: (was: Apache Spark) > Remove ScalaReflectionLock > --- > > Key: SPARK-41639 > URL: https://issues.apache.org/jira/browse/SPARK-41639 > Project: Spark > Issue Type: Task > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Sandish Kumar HN >Priority: Minor > Fix For: 3.4.0 > > > Following up from PR [https://github.com/apache/spark/pull/38922] to remove > ScalaReflectionLock from SchemaConvertors -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41639) Remove ScalaReflectionLock
[ https://issues.apache.org/jira/browse/SPARK-41639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41639: Assignee: Apache Spark > Remove ScalaReflectionLock > --- > > Key: SPARK-41639 > URL: https://issues.apache.org/jira/browse/SPARK-41639 > Project: Spark > Issue Type: Task > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Sandish Kumar HN >Assignee: Apache Spark >Priority: Minor > Fix For: 3.4.0 > > > Following up from PR [https://github.com/apache/spark/pull/38922] to remove > ScalaReflectionLock from SchemaConvertors -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41639) Remove ScalaReflectionLock
[ https://issues.apache.org/jira/browse/SPARK-41639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650060#comment-17650060 ] Apache Spark commented on SPARK-41639: -- User 'SandishKumarHN' has created a pull request for this issue: https://github.com/apache/spark/pull/39147 > Remove ScalaReflectionLock > --- > > Key: SPARK-41639 > URL: https://issues.apache.org/jira/browse/SPARK-41639 > Project: Spark > Issue Type: Task > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Sandish Kumar HN >Priority: Minor > Fix For: 3.4.0 > > > Following up from PR [https://github.com/apache/spark/pull/38922] to remove > ScalaReflectionLock from SchemaConvertors -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41639) Remove ScalaReflectionLock
[ https://issues.apache.org/jira/browse/SPARK-41639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650059#comment-17650059 ] Apache Spark commented on SPARK-41639: -- User 'SandishKumarHN' has created a pull request for this issue: https://github.com/apache/spark/pull/39147 > Remove ScalaReflectionLock > --- > > Key: SPARK-41639 > URL: https://issues.apache.org/jira/browse/SPARK-41639 > Project: Spark > Issue Type: Task > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Sandish Kumar HN >Priority: Minor > Fix For: 3.4.0 > > > Following up from PR [https://github.com/apache/spark/pull/38922] to remove > ScalaReflectionLock from SchemaConvertors -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41639) Remove ScalaReflectionLock
Sandish Kumar HN created SPARK-41639: Summary: Remove ScalaReflectionLock Key: SPARK-41639 URL: https://issues.apache.org/jira/browse/SPARK-41639 Project: Spark Issue Type: Task Components: Protobuf Affects Versions: 3.4.0 Reporter: Sandish Kumar HN Fix For: 3.4.0 Following up from PR [https://github.com/apache/spark/pull/38922] to remove ScalaReflectionLock from SchemaConvertors -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41440) Implement DataFrame.randomSplit
[ https://issues.apache.org/jira/browse/SPARK-41440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41440: Assignee: jiaan.geng (was: Ruifeng Zheng) > Implement DataFrame.randomSplit > --- > > Key: SPARK-41440 > URL: https://issues.apache.org/jira/browse/SPARK-41440 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41440) Implement DataFrame.randomSplit
[ https://issues.apache.org/jira/browse/SPARK-41440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41440. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39017 [https://github.com/apache/spark/pull/39017] > Implement DataFrame.randomSplit > --- > > Key: SPARK-41440 > URL: https://issues.apache.org/jira/browse/SPARK-41440 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41440) Implement DataFrame.randomSplit
[ https://issues.apache.org/jira/browse/SPARK-41440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41440: Assignee: Ruifeng Zheng > Implement DataFrame.randomSplit > --- > > Key: SPARK-41440 > URL: https://issues.apache.org/jira/browse/SPARK-41440 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41566) Upgrade netty from 4.1.84.Final to 4.1.86.Final
[ https://issues.apache.org/jira/browse/SPARK-41566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41566. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39116 [https://github.com/apache/spark/pull/39116] > Upgrade netty from 4.1.84.Final to 4.1.86.Final > --- > > Key: SPARK-41566 > URL: https://issues.apache.org/jira/browse/SPARK-41566 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Minor > Fix For: 3.4.0 > > > [HAProxyMessageDecoder Stack Exhaustion > DoS|https://github.com/netty/netty/security/advisories/GHSA-fx2c-96vj-985v] > and > [HTTP Response splitting from assigning header value > iterator|https://github.com/netty/netty/security/advisories/GHSA-hh82-3pmq-7frp] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41566) Upgrade netty from 4.1.84.Final to 4.1.86.Final
[ https://issues.apache.org/jira/browse/SPARK-41566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41566: Assignee: Bjørn Jørgensen > Upgrade netty from 4.1.84.Final to 4.1.86.Final > --- > > Key: SPARK-41566 > URL: https://issues.apache.org/jira/browse/SPARK-41566 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Minor > > [HAProxyMessageDecoder Stack Exhaustion > DoS|https://github.com/netty/netty/security/advisories/GHSA-fx2c-96vj-985v] > and > [HTTP Response splitting from assigning header value > iterator|https://github.com/netty/netty/security/advisories/GHSA-hh82-3pmq-7frp] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40520) Add a script to generate DOI mainifest
[ https://issues.apache.org/jira/browse/SPARK-40520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang resolved SPARK-40520. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 27 [https://github.com/apache/spark-docker/pull/27] > Add a script to generate DOI mainifest > -- > > Key: SPARK-40520 > URL: https://issues.apache.org/jira/browse/SPARK-40520 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41396) Oneof field support and recursive fields
[ https://issues.apache.org/jira/browse/SPARK-41396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-41396: --- Assignee: Sandish Kumar HN > Oneof field support and recursive fields > > > Key: SPARK-41396 > URL: https://issues.apache.org/jira/browse/SPARK-41396 > Project: Spark > Issue Type: Task > Components: Protobuf >Affects Versions: 2.3.0 >Reporter: Sandish Kumar HN >Assignee: Sandish Kumar HN >Priority: Major > > we should add support for protobuf OneOf fields to Spark-Protobuf. This will > involve implementing logic to detect when a protobuf message contains a OneOf > field, and to handle it appropriately when using from_protobuf and > to_protobuf. > we should add unit tests to ensure that the implementation of protobuf OneOf > field support is correct. > Users can use protobuf OneOf fields with Spark-protobuf, making it more > complete and useful for processing protobuf data. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41396) Oneof field support and recursive fields
[ https://issues.apache.org/jira/browse/SPARK-41396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41396. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38922 [https://github.com/apache/spark/pull/38922] > Oneof field support and recursive fields > > > Key: SPARK-41396 > URL: https://issues.apache.org/jira/browse/SPARK-41396 > Project: Spark > Issue Type: Task > Components: Protobuf >Affects Versions: 2.3.0 >Reporter: Sandish Kumar HN >Assignee: Sandish Kumar HN >Priority: Major > Fix For: 3.4.0 > > > we should add support for protobuf OneOf fields to Spark-Protobuf. This will > involve implementing logic to detect when a protobuf message contains a OneOf > field, and to handle it appropriately when using from_protobuf and > to_protobuf. > we should add unit tests to ensure that the implementation of protobuf OneOf > field support is correct. > Users can use protobuf OneOf fields with Spark-protobuf, making it more > complete and useful for processing protobuf data. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36
[ https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41584. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39125 [https://github.com/apache/spark/pull/39125] > Upgrade RoaringBitmap to 0.9.36 > --- > > Key: SPARK-41584 > URL: https://issues.apache.org/jira/browse/SPARK-41584 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.0 > > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36
[ https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41584: Assignee: Yang Jie > Upgrade RoaringBitmap to 0.9.36 > --- > > Key: SPARK-41584 > URL: https://issues.apache.org/jira/browse/SPARK-41584 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41589: Assignee: (was: Apache Spark) > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] > for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650043#comment-17650043 ] Apache Spark commented on SPARK-41589: -- User 'rithwik-db' has created a pull request for this issue: https://github.com/apache/spark/pull/39146 > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] > for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41589: Assignee: Apache Spark > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Apache Spark >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] > for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41589: - Description: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] can give more context. This was a project determined by the Databricks ML Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] for more context. (was: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] can give more context. This was a project determined by the Databricks ML Training Team; please reach out to [~gurwls223] (Spark-side proxy) or [~erithwik] for more context.) > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] > for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41638) Move most tests to .sql files
Xinyi Yu created SPARK-41638: Summary: Move most tests to .sql files Key: SPARK-41638 URL: https://issues.apache.org/jira/browse/SPARK-41638 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Xinyi Yu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41637) ORDER BY ALL
[ https://issues.apache.org/jira/browse/SPARK-41637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650036#comment-17650036 ] Apache Spark commented on SPARK-41637: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/39144 > ORDER BY ALL > > > Key: SPARK-41637 > URL: https://issues.apache.org/jira/browse/SPARK-41637 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Major > > This patch adds ORDER BY ALL support to SQL. ORDER BY ALL is a syntactic > sugar to sort the output by all the fields, from left to right. It also > allows specifying asc/desc as well as null ordering. This was initially > introduced by DuckDB. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41637) ORDER BY ALL
[ https://issues.apache.org/jira/browse/SPARK-41637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650035#comment-17650035 ] Apache Spark commented on SPARK-41637: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/39144 > ORDER BY ALL > > > Key: SPARK-41637 > URL: https://issues.apache.org/jira/browse/SPARK-41637 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Major > > This patch adds ORDER BY ALL support to SQL. ORDER BY ALL is a syntactic > sugar to sort the output by all the fields, from left to right. It also > allows specifying asc/desc as well as null ordering. This was initially > introduced by DuckDB. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41637) ORDER BY ALL
[ https://issues.apache.org/jira/browse/SPARK-41637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41637: Assignee: Apache Spark (was: Reynold Xin) > ORDER BY ALL > > > Key: SPARK-41637 > URL: https://issues.apache.org/jira/browse/SPARK-41637 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Reynold Xin >Assignee: Apache Spark >Priority: Major > > This patch adds ORDER BY ALL support to SQL. ORDER BY ALL is a syntactic > sugar to sort the output by all the fields, from left to right. It also > allows specifying asc/desc as well as null ordering. This was initially > introduced by DuckDB. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org