[jira] [Updated] (SPARK-46772) Benchmarking Avro with ZSTD
[ https://issues.apache.org/jira/browse/SPARK-46772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46772: --- Labels: pull-request-available (was: ) > Benchmarking Avro with ZSTD > > > Key: SPARK-46772 > URL: https://issues.apache.org/jira/browse/SPARK-46772 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46808) Refine error classes in Python with automatic sorting function
[ https://issues.apache.org/jira/browse/SPARK-46808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46808. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44848 [https://github.com/apache/spark/pull/44848] > Refine error classes in Python with automatic sorting function > -- > > Key: SPARK-46808 > URL: https://issues.apache.org/jira/browse/SPARK-46808 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > There are too many inconsistency within error_classes, and there's no way to > automatically generate/sort the error classes. We should make the dev easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46808) Refine error classes in Python with automatic sorting function
[ https://issues.apache.org/jira/browse/SPARK-46808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46808: Assignee: Hyukjin Kwon > Refine error classes in Python with automatic sorting function > -- > > Key: SPARK-46808 > URL: https://issues.apache.org/jira/browse/SPARK-46808 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > There are too many inconsistency within error_classes, and there's no way to > automatically generate/sort the error classes. We should make the dev easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46807) Include automation notice in SQL error class documents
[ https://issues.apache.org/jira/browse/SPARK-46807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46807: Assignee: Nicholas Chammas > Include automation notice in SQL error class documents > -- > > Key: SPARK-46807 > URL: https://issues.apache.org/jira/browse/SPARK-46807 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46807) Include automation notice in SQL error class documents
[ https://issues.apache.org/jira/browse/SPARK-46807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46807. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44847 [https://github.com/apache/spark/pull/44847] > Include automation notice in SQL error class documents > -- > > Key: SPARK-46807 > URL: https://issues.apache.org/jira/browse/SPARK-46807 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45869) Revisit and Improve Spark Standalone Cluster
[ https://issues.apache.org/jira/browse/SPARK-45869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45869: -- Priority: Critical (was: Major) > Revisit and Improve Spark Standalone Cluster > > > Key: SPARK-45869 > URL: https://issues.apache.org/jira/browse/SPARK-45869 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Critical > Labels: releasenotes > Fix For: 4.0.0 > > > This is an experimental internal configuration for advance users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`
[ https://issues.apache.org/jira/browse/SPARK-46800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46800: -- Priority: Major (was: Critical) > Support `spark.deploy.spreadOutDrivers` > --- > > Key: SPARK-46800 > URL: https://issues.apache.org/jira/browse/SPARK-46800 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`
[ https://issues.apache.org/jira/browse/SPARK-46800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46800: -- Priority: Critical (was: Major) > Support `spark.deploy.spreadOutDrivers` > --- > > Key: SPARK-46800 > URL: https://issues.apache.org/jira/browse/SPARK-46800 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46718) Upgrade Arrow to 15.0.0
[ https://issues.apache.org/jira/browse/SPARK-46718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46718. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44797 [https://github.com/apache/spark/pull/44797] > Upgrade Arrow to 15.0.0 > --- > > Key: SPARK-46718 > URL: https://issues.apache.org/jira/browse/SPARK-46718 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: image-2024-01-15-14-02-57-814.png > > > https://github.com/apache/arrow/releases/tag/apache-arrow-15.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46718) Upgrade Arrow to 15.0.0
[ https://issues.apache.org/jira/browse/SPARK-46718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46718: - Assignee: Yang Jie > Upgrade Arrow to 15.0.0 > --- > > Key: SPARK-46718 > URL: https://issues.apache.org/jira/browse/SPARK-46718 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Attachments: image-2024-01-15-14-02-57-814.png > > > https://github.com/apache/arrow/releases/tag/apache-arrow-15.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46805) Upgrade `scalafmt` to 3.7.17
[ https://issues.apache.org/jira/browse/SPARK-46805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46805. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44845 [https://github.com/apache/spark/pull/44845] > Upgrade `scalafmt` to 3.7.17 > > > Key: SPARK-46805 > URL: https://issues.apache.org/jira/browse/SPARK-46805 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46808) Refine error classes in Python with automatic sorting function
[ https://issues.apache.org/jira/browse/SPARK-46808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46808: - Parent: SPARK-45673 Issue Type: Sub-task (was: Test) > Refine error classes in Python with automatic sorting function > -- > > Key: SPARK-46808 > URL: https://issues.apache.org/jira/browse/SPARK-46808 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > There are too many inconsistency within error_classes, and there's no way to > automatically generate/sort the error classes. We should make the dev easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46809) Check error message parameter properly
[ https://issues.apache.org/jira/browse/SPARK-46809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46809: Description: If error message parameter from template is missing in actual usage or the name is different, it should raise exception but currently it's not. We should handle this to work properly. (was: If error message parameter from template is missing in actual usage, it should raise exception but currently it's not. We should handle this to work properly.) > Check error message parameter properly > -- > > Key: SPARK-46809 > URL: https://issues.apache.org/jira/browse/SPARK-46809 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > If error message parameter from template is missing in actual usage or the > name is different, it should raise exception but currently it's not. We > should handle this to work properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46809) Check error message parameter properly
[ https://issues.apache.org/jira/browse/SPARK-46809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46809: Summary: Check error message parameter properly (was: Check missing error message parameter properly) > Check error message parameter properly > -- > > Key: SPARK-46809 > URL: https://issues.apache.org/jira/browse/SPARK-46809 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > If error message parameter from template is missing in actual usage, it > should raise exception but currently it's not. We should handle this to work > properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46809) Check missing error message parameter properly
Haejoon Lee created SPARK-46809: --- Summary: Check missing error message parameter properly Key: SPARK-46809 URL: https://issues.apache.org/jira/browse/SPARK-46809 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee If error message parameter from template is missing in actual usage, it should raise exception but currently it's not. We should handle this to work properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46806) Improve error message for spark.table when argument type is wrong
[ https://issues.apache.org/jira/browse/SPARK-46806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46806. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44846 [https://github.com/apache/spark/pull/44846] > Improve error message for spark.table when argument type is wrong > - > > Key: SPARK-46806 > URL: https://issues.apache.org/jira/browse/SPARK-46806 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > >>> spark.table(None) > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/session.py", line 1710, in table > return DataFrame(self._jsparkSession.table(tableName), self) > > File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > File "/.../spark/python/pyspark/errors/exceptions/captured.py", line 215, > in deco > return f(*a, **kw) >^^^ > File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line > 326, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o27.table. > : java.lang.NullPointerException: Cannot invoke "String.length()" because "s" > is null > at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222) > at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212) > at > org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54) > at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681) > at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) > at py4j.Gateway.invoke(Gateway.java:282) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46806) Improve error message for spark.table when argument type is wrong
[ https://issues.apache.org/jira/browse/SPARK-46806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46806: Assignee: Hyukjin Kwon > Improve error message for spark.table when argument type is wrong > - > > Key: SPARK-46806 > URL: https://issues.apache.org/jira/browse/SPARK-46806 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > {code} > >>> spark.table(None) > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/session.py", line 1710, in table > return DataFrame(self._jsparkSession.table(tableName), self) > > File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > File "/.../spark/python/pyspark/errors/exceptions/captured.py", line 215, > in deco > return f(*a, **kw) >^^^ > File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line > 326, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o27.table. > : java.lang.NullPointerException: Cannot invoke "String.length()" because "s" > is null > at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222) > at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212) > at > org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54) > at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681) > at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) > at py4j.Gateway.invoke(Gateway.java:282) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46808) Refine error classes in Python with automatic sorting function
[ https://issues.apache.org/jira/browse/SPARK-46808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46808: --- Labels: pull-request-available (was: ) > Refine error classes in Python with automatic sorting function > -- > > Key: SPARK-46808 > URL: https://issues.apache.org/jira/browse/SPARK-46808 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > There are too many inconsistency within error_classes, and there's no way to > automatically generate/sort the error classes. We should make the dev easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46807) Include automation notice in SQL error class documents
[ https://issues.apache.org/jira/browse/SPARK-46807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46807: --- Labels: pull-request-available (was: ) > Include automation notice in SQL error class documents > -- > > Key: SPARK-46807 > URL: https://issues.apache.org/jira/browse/SPARK-46807 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46808) Refine error classes in Python with automatic sorting function
Hyukjin Kwon created SPARK-46808: Summary: Refine error classes in Python with automatic sorting function Key: SPARK-46808 URL: https://issues.apache.org/jira/browse/SPARK-46808 Project: Spark Issue Type: Test Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon There are too many inconsistency within error_classes, and there's no way to automatically generate/sort the error classes. We should make the dev easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46807) Include automation notice in SQL error class documents
Nicholas Chammas created SPARK-46807: Summary: Include automation notice in SQL error class documents Key: SPARK-46807 URL: https://issues.apache.org/jira/browse/SPARK-46807 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 4.0.0 Reporter: Nicholas Chammas -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`
[ https://issues.apache.org/jira/browse/SPARK-46800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46800. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44840 [https://github.com/apache/spark/pull/44840] > Support `spark.deploy.spreadOutDrivers` > --- > > Key: SPARK-46800 > URL: https://issues.apache.org/jira/browse/SPARK-46800 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46806) Improve error message for spark.table when argument type is wrong
[ https://issues.apache.org/jira/browse/SPARK-46806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46806: --- Labels: pull-request-available (was: ) > Improve error message for spark.table when argument type is wrong > - > > Key: SPARK-46806 > URL: https://issues.apache.org/jira/browse/SPARK-46806 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > {code} > >>> spark.table(None) > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/session.py", line 1710, in table > return DataFrame(self._jsparkSession.table(tableName), self) > > File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > File "/.../spark/python/pyspark/errors/exceptions/captured.py", line 215, > in deco > return f(*a, **kw) >^^^ > File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line > 326, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o27.table. > : java.lang.NullPointerException: Cannot invoke "String.length()" because "s" > is null > at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222) > at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212) > at > org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54) > at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681) > at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) > at py4j.Gateway.invoke(Gateway.java:282) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46806) Improve error message for spark.table when argument type is wrong
[ https://issues.apache.org/jira/browse/SPARK-46806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46806: - Description: {code} >>> spark.table(None) Traceback (most recent call last): File "", line 1, in File "/.../spark/python/pyspark/sql/session.py", line 1710, in table return DataFrame(self._jsparkSession.table(tableName), self) File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/.../spark/python/pyspark/errors/exceptions/captured.py", line 215, in deco return f(*a, **kw) ^^^ File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o27.table. : java.lang.NullPointerException: Cannot invoke "String.length()" because "s" is null at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222) at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212) at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54) at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681) at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) {code} was: {code} >>> spark.table(None) Traceback (most recent call last): File "", line 1, in File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/session.py", line 1710, in table return DataFrame(self._jsparkSession.table(tableName), self) File "/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/errors/exceptions/captured.py", line 215, in deco return f(*a, **kw) ^^^ File "/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o27.table. : java.lang.NullPointerException: Cannot invoke "String.length()" because "s" is null at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222) at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212) at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54) at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681) at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) {code} > Improve error message for spark.table when argument type is wrong > - > > Key: SPARK-46806 > URL: https://issues.apache.org/jira/browse/SPARK-46806 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > {code} > >>> spark.table(None) > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/session.py", line 1710, in table > return DataFrame(self._jsparkSession.table(tableName), self) >
[jira] [Created] (SPARK-46806) Improve error message for spark.table when argument type is wrong
Hyukjin Kwon created SPARK-46806: Summary: Improve error message for spark.table when argument type is wrong Key: SPARK-46806 URL: https://issues.apache.org/jira/browse/SPARK-46806 Project: Spark Issue Type: Test Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} >>> spark.table(None) Traceback (most recent call last): File "", line 1, in File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/session.py", line 1710, in table return DataFrame(self._jsparkSession.table(tableName), self) File "/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/errors/exceptions/captured.py", line 215, in deco return f(*a, **kw) ^^^ File "/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o27.table. : java.lang.NullPointerException: Cannot invoke "String.length()" because "s" is null at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222) at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212) at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54) at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681) at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46803) Remove scala-2.13 profile
[ https://issues.apache.org/jira/browse/SPARK-46803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan resolved SPARK-46803. - Resolution: Not A Problem > Remove scala-2.13 profile > - > > Key: SPARK-46803 > URL: https://issues.apache.org/jira/browse/SPARK-46803 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46804) Recover the generated documents
[ https://issues.apache.org/jira/browse/SPARK-46804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46804: - Assignee: Dongjoon Hyun > Recover the generated documents > --- > > Key: SPARK-46804 > URL: https://issues.apache.org/jira/browse/SPARK-46804 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46804) Recover the generated documents
[ https://issues.apache.org/jira/browse/SPARK-46804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46804. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44843 [https://github.com/apache/spark/pull/44843] > Recover the generated documents > --- > > Key: SPARK-46804 > URL: https://issues.apache.org/jira/browse/SPARK-46804 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46802) Cleanup codecov script
[ https://issues.apache.org/jira/browse/SPARK-46802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46802. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44842 [https://github.com/apache/spark/pull/44842] > Cleanup codecov script > -- > > Key: SPARK-46802 > URL: https://issues.apache.org/jira/browse/SPARK-46802 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > We used to use {{coverage_daemon.py}} to track the coverage of the Python > worker side (SPARK-7721). However, seems it does not work anymore. We should > remove it out first, and think about other ways around. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46802) Cleanup codecov script
[ https://issues.apache.org/jira/browse/SPARK-46802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46802: Assignee: Hyukjin Kwon > Cleanup codecov script > -- > > Key: SPARK-46802 > URL: https://issues.apache.org/jira/browse/SPARK-46802 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > We used to use {{coverage_daemon.py}} to track the coverage of the Python > worker side (SPARK-7721). However, seems it does not work anymore. We should > remove it out first, and think about other ways around. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46803) Remove scala-2.13 profile
[ https://issues.apache.org/jira/browse/SPARK-46803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46803: --- Labels: pull-request-available (was: ) > Remove scala-2.13 profile > - > > Key: SPARK-46803 > URL: https://issues.apache.org/jira/browse/SPARK-46803 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46804) Recover the generated documents
[ https://issues.apache.org/jira/browse/SPARK-46804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46804: --- Labels: pull-request-available (was: ) > Recover the generated documents > --- > > Key: SPARK-46804 > URL: https://issues.apache.org/jira/browse/SPARK-46804 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46804) Recover the generated documents
Dongjoon Hyun created SPARK-46804: - Summary: Recover the generated documents Key: SPARK-46804 URL: https://issues.apache.org/jira/browse/SPARK-46804 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46803) Remove scala-2.13 profile
BingKun Pan created SPARK-46803: --- Summary: Remove scala-2.13 profile Key: SPARK-46803 URL: https://issues.apache.org/jira/browse/SPARK-46803 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46802) Cleanup codecov script
[ https://issues.apache.org/jira/browse/SPARK-46802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46802: --- Labels: pull-request-available (was: ) > Cleanup codecov script > -- > > Key: SPARK-46802 > URL: https://issues.apache.org/jira/browse/SPARK-46802 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > We used to use {{coverage_daemon.py}} to track the coverage of the Python > worker side (SPARK-7721). However, seems it does not work anymore. We should > remove it out first, and think about other ways around. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46801) Do not treat exit 5 as a test failure in Python testing script
[ https://issues.apache.org/jira/browse/SPARK-46801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46801. --- Fix Version/s: 3.4.3 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 44841 [https://github.com/apache/spark/pull/44841] > Do not treat exit 5 as a test failure in Python testing script > -- > > Key: SPARK-46801 > URL: https://issues.apache.org/jira/browse/SPARK-46801 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.4.3, 3.5.1, 4.0.0 > > > {code} > > Running PySpark tests > > Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log > Will test against the following Python executables: ['python3.12'] > Will test the following Python modules: ['pyspark-core', 'pyspark-streaming', > 'pyspark-errors'] > python3.12 python_implementation is CPython > python3.12 version is: Python 3.12.1 > Starting test(python3.12): pyspark.streaming.tests.test_context (temp output: > /__w/spark/spark/python/target/8674ed86-36bd-47d1-863b-abb0405557f6/python3.12__pyspark.streaming.tests.test_context__umu69c3v.log) > Finished test(python3.12): pyspark.streaming.tests.test_context (12s) > Starting test(python3.12): pyspark.streaming.tests.test_dstream (temp output: > /__w/spark/spark/python/target/847eb56b-3c5f-49ab-8a83-3326bb96bc5d/python3.12__pyspark.streaming.tests.test_dstream__rorhk0lc.log) > Finished test(python3.12): pyspark.streaming.tests.test_dstream (102s) > Starting test(python3.12): pyspark.streaming.tests.test_kinesis (temp output: > /__w/spark/spark/python/target/78f23c83-c24d-4fa1-abbd-edb90f48dff1/python3.12__pyspark.streaming.tests.test_kinesis__q5l1pv0h.log) > test_kinesis_stream > (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream) > ... skipped "Skipping all Kinesis Python tests as environmental variable > 'ENABLE_KINESIS_TESTS' was not set." > test_kinesis_stream_api > (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream_api) > ... skipped "Skipping all Kinesis Python tests as environmental variable > 'ENABLE_KINESIS_TESTS' was not set." > -- > Ran 0 tests in 0.000s > NO TESTS RAN (skipped=2) > Had test failures in pyspark.streaming.tests.test_kinesis with python3.12; > see logs. > Error: running /__w/spark/spark/python/run-tests > --modules=pyspark-core,pyspark-streaming,pyspark-errors --parallelism=1 > --python-executables=python3.12 ; received return code 255 > Error: Process completed with exit code 19. > {code} > Scheduled job fails because of exit 5, see > https://github.com/pytest-dev/pytest/issues/2393. This isn't a test failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46801) Do not treat exit 5 as a test failure in Python testing script
[ https://issues.apache.org/jira/browse/SPARK-46801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46801: - Assignee: Hyukjin Kwon > Do not treat exit 5 as a test failure in Python testing script > -- > > Key: SPARK-46801 > URL: https://issues.apache.org/jira/browse/SPARK-46801 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > > Running PySpark tests > > Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log > Will test against the following Python executables: ['python3.12'] > Will test the following Python modules: ['pyspark-core', 'pyspark-streaming', > 'pyspark-errors'] > python3.12 python_implementation is CPython > python3.12 version is: Python 3.12.1 > Starting test(python3.12): pyspark.streaming.tests.test_context (temp output: > /__w/spark/spark/python/target/8674ed86-36bd-47d1-863b-abb0405557f6/python3.12__pyspark.streaming.tests.test_context__umu69c3v.log) > Finished test(python3.12): pyspark.streaming.tests.test_context (12s) > Starting test(python3.12): pyspark.streaming.tests.test_dstream (temp output: > /__w/spark/spark/python/target/847eb56b-3c5f-49ab-8a83-3326bb96bc5d/python3.12__pyspark.streaming.tests.test_dstream__rorhk0lc.log) > Finished test(python3.12): pyspark.streaming.tests.test_dstream (102s) > Starting test(python3.12): pyspark.streaming.tests.test_kinesis (temp output: > /__w/spark/spark/python/target/78f23c83-c24d-4fa1-abbd-edb90f48dff1/python3.12__pyspark.streaming.tests.test_kinesis__q5l1pv0h.log) > test_kinesis_stream > (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream) > ... skipped "Skipping all Kinesis Python tests as environmental variable > 'ENABLE_KINESIS_TESTS' was not set." > test_kinesis_stream_api > (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream_api) > ... skipped "Skipping all Kinesis Python tests as environmental variable > 'ENABLE_KINESIS_TESTS' was not set." > -- > Ran 0 tests in 0.000s > NO TESTS RAN (skipped=2) > Had test failures in pyspark.streaming.tests.test_kinesis with python3.12; > see logs. > Error: running /__w/spark/spark/python/run-tests > --modules=pyspark-core,pyspark-streaming,pyspark-errors --parallelism=1 > --python-executables=python3.12 ; received return code 255 > Error: Process completed with exit code 19. > {code} > Scheduled job fails because of exit 5, see > https://github.com/pytest-dev/pytest/issues/2393. This isn't a test failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46802) Cleanup codecov script
Hyukjin Kwon created SPARK-46802: Summary: Cleanup codecov script Key: SPARK-46802 URL: https://issues.apache.org/jira/browse/SPARK-46802 Project: Spark Issue Type: Test Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon We used to use {{coverage_daemon.py}} to track the coverage of the Python worker side (SPARK-7721). However, seems it does not work anymore. We should remove it out first, and think about other ways around. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`
[ https://issues.apache.org/jira/browse/SPARK-46800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46800: - Assignee: Dongjoon Hyun > Support `spark.deploy.spreadOutDrivers` > --- > > Key: SPARK-46800 > URL: https://issues.apache.org/jira/browse/SPARK-46800 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46801) Do not treat exit 5 as a test failure in Python testing script
[ https://issues.apache.org/jira/browse/SPARK-46801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46801: --- Labels: pull-request-available (was: ) > Do not treat exit 5 as a test failure in Python testing script > -- > > Key: SPARK-46801 > URL: https://issues.apache.org/jira/browse/SPARK-46801 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > > Running PySpark tests > > Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log > Will test against the following Python executables: ['python3.12'] > Will test the following Python modules: ['pyspark-core', 'pyspark-streaming', > 'pyspark-errors'] > python3.12 python_implementation is CPython > python3.12 version is: Python 3.12.1 > Starting test(python3.12): pyspark.streaming.tests.test_context (temp output: > /__w/spark/spark/python/target/8674ed86-36bd-47d1-863b-abb0405557f6/python3.12__pyspark.streaming.tests.test_context__umu69c3v.log) > Finished test(python3.12): pyspark.streaming.tests.test_context (12s) > Starting test(python3.12): pyspark.streaming.tests.test_dstream (temp output: > /__w/spark/spark/python/target/847eb56b-3c5f-49ab-8a83-3326bb96bc5d/python3.12__pyspark.streaming.tests.test_dstream__rorhk0lc.log) > Finished test(python3.12): pyspark.streaming.tests.test_dstream (102s) > Starting test(python3.12): pyspark.streaming.tests.test_kinesis (temp output: > /__w/spark/spark/python/target/78f23c83-c24d-4fa1-abbd-edb90f48dff1/python3.12__pyspark.streaming.tests.test_kinesis__q5l1pv0h.log) > test_kinesis_stream > (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream) > ... skipped "Skipping all Kinesis Python tests as environmental variable > 'ENABLE_KINESIS_TESTS' was not set." > test_kinesis_stream_api > (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream_api) > ... skipped "Skipping all Kinesis Python tests as environmental variable > 'ENABLE_KINESIS_TESTS' was not set." > -- > Ran 0 tests in 0.000s > NO TESTS RAN (skipped=2) > Had test failures in pyspark.streaming.tests.test_kinesis with python3.12; > see logs. > Error: running /__w/spark/spark/python/run-tests > --modules=pyspark-core,pyspark-streaming,pyspark-errors --parallelism=1 > --python-executables=python3.12 ; received return code 255 > Error: Process completed with exit code 19. > {code} > Scheduled job fails because of exit 5, see > https://github.com/pytest-dev/pytest/issues/2393. This isn't a test failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`
[ https://issues.apache.org/jira/browse/SPARK-46800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46800: --- Labels: pull-request-available (was: ) > Support `spark.deploy.spreadOutDrivers` > --- > > Key: SPARK-46800 > URL: https://issues.apache.org/jira/browse/SPARK-46800 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46801) Do not treat exit 5 as a test failure in Python testing script
Hyukjin Kwon created SPARK-46801: Summary: Do not treat exit 5 as a test failure in Python testing script Key: SPARK-46801 URL: https://issues.apache.org/jira/browse/SPARK-46801 Project: Spark Issue Type: Test Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} Running PySpark tests Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log Will test against the following Python executables: ['python3.12'] Will test the following Python modules: ['pyspark-core', 'pyspark-streaming', 'pyspark-errors'] python3.12 python_implementation is CPython python3.12 version is: Python 3.12.1 Starting test(python3.12): pyspark.streaming.tests.test_context (temp output: /__w/spark/spark/python/target/8674ed86-36bd-47d1-863b-abb0405557f6/python3.12__pyspark.streaming.tests.test_context__umu69c3v.log) Finished test(python3.12): pyspark.streaming.tests.test_context (12s) Starting test(python3.12): pyspark.streaming.tests.test_dstream (temp output: /__w/spark/spark/python/target/847eb56b-3c5f-49ab-8a83-3326bb96bc5d/python3.12__pyspark.streaming.tests.test_dstream__rorhk0lc.log) Finished test(python3.12): pyspark.streaming.tests.test_dstream (102s) Starting test(python3.12): pyspark.streaming.tests.test_kinesis (temp output: /__w/spark/spark/python/target/78f23c83-c24d-4fa1-abbd-edb90f48dff1/python3.12__pyspark.streaming.tests.test_kinesis__q5l1pv0h.log) test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream) ... skipped "Skipping all Kinesis Python tests as environmental variable 'ENABLE_KINESIS_TESTS' was not set." test_kinesis_stream_api (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream_api) ... skipped "Skipping all Kinesis Python tests as environmental variable 'ENABLE_KINESIS_TESTS' was not set." -- Ran 0 tests in 0.000s NO TESTS RAN (skipped=2) Had test failures in pyspark.streaming.tests.test_kinesis with python3.12; see logs. Error: running /__w/spark/spark/python/run-tests --modules=pyspark-core,pyspark-streaming,pyspark-errors --parallelism=1 --python-executables=python3.12 ; received return code 255 Error: Process completed with exit code 19. {code} Scheduled job fails because of exit 5, see https://github.com/pytest-dev/pytest/issues/2393. This isn't a test failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`
Dongjoon Hyun created SPARK-46800: - Summary: Support `spark.deploy.spreadOutDrivers` Key: SPARK-46800 URL: https://issues.apache.org/jira/browse/SPARK-46800 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46799) Improve `MasterSuite` to use nanoTime-based appIDs and workerIDs
[ https://issues.apache.org/jira/browse/SPARK-46799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46799: --- Labels: pull-request-available (was: ) > Improve `MasterSuite` to use nanoTime-based appIDs and workerIDs > > > Key: SPARK-46799 > URL: https://issues.apache.org/jira/browse/SPARK-46799 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46797) Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps
[ https://issues.apache.org/jira/browse/SPARK-46797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46797. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44838 [https://github.com/apache/spark/pull/44838] > Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps > --- > > Key: SPARK-46797 > URL: https://issues.apache.org/jira/browse/SPARK-46797 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46781) Test custom data source and input partition (pyspark.sql.datasource)
[ https://issues.apache.org/jira/browse/SPARK-46781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46781. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44808 [https://github.com/apache/spark/pull/44808] > Test custom data source and input partition (pyspark.sql.datasource) > > > Key: SPARK-46781 > URL: https://issues.apache.org/jira/browse/SPARK-46781 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Test custom data source and input partition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46798) Kafka custom partition location assignment in Spark Structured Streaming (rack awareness)
Randall Schwager created SPARK-46798: Summary: Kafka custom partition location assignment in Spark Structured Streaming (rack awareness) Key: SPARK-46798 URL: https://issues.apache.org/jira/browse/SPARK-46798 Project: Spark Issue Type: New Feature Components: Structured Streaming Affects Versions: 3.5.0, 3.4.0, 3.3.0, 3.2.0, 3.1.0 Reporter: Randall Schwager SPARK-15406 Added Kafka consumer support to Spark Structured Streaming, but it did not add custom partition location assignment as a feature. The Structured Streaming Kafka consumer as it exists today evenly allocates Kafka topic partitions to executors without regard to Kafka broker rack information or executor location. This behavior can drive large cross-AZ networking costs in large deployments. In the [Design Doc|https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit#heading=h.k36c6oyz89xw] for SPARK-15406, the ability to assign Kafka partitions to particular executors (a feature which would enable rack awareness) was discussed, but never implemented. For DStreams users, there does seem to be a way to assign Kafka partitions to Spark executors in a custom fashion: [LocationStrategies.PreferFixed|https://github.com/apache/spark/blob/master/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/LocationStrategy.scala#L69]. I'd like to propose, and implement if approved, support for custom partition location assignment. Please find the design doc describing the change [here|https://docs.google.com/document/d/1RoEk_mt8AUh9sTQZ1NfzIuuYKf1zx6BP1K3IlJ2b8iM/edit#heading=h.pbt6pdb2jt5c] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46797) Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps
[ https://issues.apache.org/jira/browse/SPARK-46797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46797: -- Parent: SPARK-45869 Issue Type: Sub-task (was: Improvement) > Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps > --- > > Key: SPARK-46797 > URL: https://issues.apache.org/jira/browse/SPARK-46797 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46797) Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps
[ https://issues.apache.org/jira/browse/SPARK-46797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46797: - Assignee: Dongjoon Hyun > Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps > --- > > Key: SPARK-46797 > URL: https://issues.apache.org/jira/browse/SPARK-46797 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46797) Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps
[ https://issues.apache.org/jira/browse/SPARK-46797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46797: --- Labels: pull-request-available (was: ) > Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps > --- > > Key: SPARK-46797 > URL: https://issues.apache.org/jira/browse/SPARK-46797 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46797) Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps
Dongjoon Hyun created SPARK-46797: - Summary: Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps Key: SPARK-46797 URL: https://issues.apache.org/jira/browse/SPARK-46797 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46796) RocksDB versionID Mismatch in SST files
[ https://issues.apache.org/jira/browse/SPARK-46796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809657#comment-17809657 ] Bhuwan Sahni commented on SPARK-46796: -- PR created - [https://github.com/apache/spark/pull/44837] > RocksDB versionID Mismatch in SST files > --- > > Key: SPARK-46796 > URL: https://issues.apache.org/jira/browse/SPARK-46796 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.4.2, 3.4.1, 3.5.0, 4.0.0, 3.5.1, 3.5.2 >Reporter: Bhuwan Sahni >Priority: Major > Labels: pull-request-available > > We need to ensure that the correct SST files are used on executor during > RocksDB load as per mapping in metadata.zip. With current implementation, its > possible that the executor uses a SST file (with a different UUID) from a > older version which is not the exact file mapped in the metadata.zip. This > can cause version Id mismatch errors while loading RocksDB leading to > streaming query failures. > Few scenarios in which such a situation can occur are: > **Scenario 1 - Distributed file system does not support overwrite > functionality** > # A task T1 on executor A commits Rocks Db snapshot for version X. > # Another task T2 on executor A loads version X-1, and tries to commit X. > During commit, SST files are copied but metadata file is not overwritten. > # Task T3 is scheduled on A, this task reuses previously loaded X (loaded in > (2) above) and commits X+1. > # Task T4 is scheduled on A again for state store version X. The executor > deletes SST files corresponding to commit X+1, downloads the metadata for > version X (which was committed in task T1), and loads RocksDB. This would > fail because the metadata in (1) is not compatible with SST files in (2). > > **Scenario 2 - Multiple older State versions have different DFS files for a > particular SST file.** > In the current logic, we look at all the versions older than X to find if a > local SST file can be reused. The reuse logic only ensures that the local SST > file was present in any of the previous version. However, its possible that 2 > different older versions had a different SST file (`0001-uuid1.sst` and > `0001-uuid2.sst`) uploaded on DFS. These SST files will have the same local > name (with UUID truncated) and size, but are not compatible due to different > RocksDB Version Ids. We need to ensure that the correct SST file (as per > UUID) is picked as mentioned in the metadata.zip. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46796) RocksDB versionID Mismatch in SST files
[ https://issues.apache.org/jira/browse/SPARK-46796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46796: --- Labels: pull-request-available (was: ) > RocksDB versionID Mismatch in SST files > --- > > Key: SPARK-46796 > URL: https://issues.apache.org/jira/browse/SPARK-46796 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.4.2, 3.4.1, 3.5.0, 4.0.0, 3.5.1, 3.5.2 >Reporter: Bhuwan Sahni >Priority: Major > Labels: pull-request-available > > We need to ensure that the correct SST files are used on executor during > RocksDB load as per mapping in metadata.zip. With current implementation, its > possible that the executor uses a SST file (with a different UUID) from a > older version which is not the exact file mapped in the metadata.zip. This > can cause version Id mismatch errors while loading RocksDB leading to > streaming query failures. > Few scenarios in which such a situation can occur are: > **Scenario 1 - Distributed file system does not support overwrite > functionality** > # A task T1 on executor A commits Rocks Db snapshot for version X. > # Another task T2 on executor A loads version X-1, and tries to commit X. > During commit, SST files are copied but metadata file is not overwritten. > # Task T3 is scheduled on A, this task reuses previously loaded X (loaded in > (2) above) and commits X+1. > # Task T4 is scheduled on A again for state store version X. The executor > deletes SST files corresponding to commit X+1, downloads the metadata for > version X (which was committed in task T1), and loads RocksDB. This would > fail because the metadata in (1) is not compatible with SST files in (2). > > **Scenario 2 - Multiple older State versions have different DFS files for a > particular SST file.** > In the current logic, we look at all the versions older than X to find if a > local SST file can be reused. The reuse logic only ensures that the local SST > file was present in any of the previous version. However, its possible that 2 > different older versions had a different SST file (`0001-uuid1.sst` and > `0001-uuid2.sst`) uploaded on DFS. These SST files will have the same local > name (with UUID truncated) and size, but are not compatible due to different > RocksDB Version Ids. We need to ensure that the correct SST file (as per > UUID) is picked as mentioned in the metadata.zip. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46687) Implement memory-profiler
[ https://issues.apache.org/jira/browse/SPARK-46687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46687: --- Labels: pull-request-available (was: ) > Implement memory-profiler > - > > Key: SPARK-46687 > URL: https://issues.apache.org/jira/browse/SPARK-46687 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46796) RocksDB versionID Mismatch in SST files
[ https://issues.apache.org/jira/browse/SPARK-46796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809634#comment-17809634 ] Bhuwan Sahni commented on SPARK-46796: -- Working on a PR for the fix. > RocksDB versionID Mismatch in SST files > --- > > Key: SPARK-46796 > URL: https://issues.apache.org/jira/browse/SPARK-46796 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.4.2, 3.4.1, 3.5.0, 4.0.0, 3.5.1, 3.5.2 >Reporter: Bhuwan Sahni >Priority: Major > > We need to ensure that the correct SST files are used on executor during > RocksDB load as per mapping in metadata.zip. With current implementation, its > possible that the executor uses a SST file (with a different UUID) from a > older version which is not the exact file mapped in the metadata.zip. This > can cause version Id mismatch errors while loading RocksDB leading to > streaming query failures. > Few scenarios in which such a situation can occur are: > **Scenario 1 - Distributed file system does not support overwrite > functionality** > # A task T1 on executor A commits Rocks Db snapshot for version X. > # Another task T2 on executor A loads version X-1, and tries to commit X. > During commit, SST files are copied but metadata file is not overwritten. > # Task T3 is scheduled on A, this task reuses previously loaded X (loaded in > (2) above) and commits X+1. > # Task T4 is scheduled on A again for state store version X. The executor > deletes SST files corresponding to commit X+1, downloads the metadata for > version X (which was committed in task T1), and loads RocksDB. This would > fail because the metadata in (1) is not compatible with SST files in (2). > > **Scenario 2 - Multiple older State versions have different DFS files for a > particular SST file.** > In the current logic, we look at all the versions older than X to find if a > local SST file can be reused. The reuse logic only ensures that the local SST > file was present in any of the previous version. However, its possible that 2 > different older versions had a different SST file (`0001-uuid1.sst` and > `0001-uuid2.sst`) uploaded on DFS. These SST files will have the same local > name (with UUID truncated) and size, but are not compatible due to different > RocksDB Version Ids. We need to ensure that the correct SST file (as per > UUID) is picked as mentioned in the metadata.zip. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46796) RocksDB versionID Mismatch in SST files
Bhuwan Sahni created SPARK-46796: Summary: RocksDB versionID Mismatch in SST files Key: SPARK-46796 URL: https://issues.apache.org/jira/browse/SPARK-46796 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.5.0, 3.4.1, 3.4.2, 4.0.0, 3.5.1, 3.5.2 Reporter: Bhuwan Sahni We need to ensure that the correct SST files are used on executor during RocksDB load as per mapping in metadata.zip. With current implementation, its possible that the executor uses a SST file (with a different UUID) from a older version which is not the exact file mapped in the metadata.zip. This can cause version Id mismatch errors while loading RocksDB leading to streaming query failures. Few scenarios in which such a situation can occur are: **Scenario 1 - Distributed file system does not support overwrite functionality** # A task T1 on executor A commits Rocks Db snapshot for version X. # Another task T2 on executor A loads version X-1, and tries to commit X. During commit, SST files are copied but metadata file is not overwritten. # Task T3 is scheduled on A, this task reuses previously loaded X (loaded in (2) above) and commits X+1. # Task T4 is scheduled on A again for state store version X. The executor deletes SST files corresponding to commit X+1, downloads the metadata for version X (which was committed in task T1), and loads RocksDB. This would fail because the metadata in (1) is not compatible with SST files in (2). **Scenario 2 - Multiple older State versions have different DFS files for a particular SST file.** In the current logic, we look at all the versions older than X to find if a local SST file can be reused. The reuse logic only ensures that the local SST file was present in any of the previous version. However, its possible that 2 different older versions had a different SST file (`0001-uuid1.sst` and `0001-uuid2.sst`) uploaded on DFS. These SST files will have the same local name (with UUID truncated) and size, but are not compatible due to different RocksDB Version Ids. We need to ensure that the correct SST file (as per UUID) is picked as mentioned in the metadata.zip. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46779) Grouping by subquery with a cached relation can fail
[ https://issues.apache.org/jira/browse/SPARK-46779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46779: - Assignee: Bruce Robbins > Grouping by subquery with a cached relation can fail > > > Key: SPARK-46779 > URL: https://issues.apache.org/jira/browse/SPARK-46779 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 3.5.0, 4.0.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: pull-request-available > > Example: > {noformat} > create or replace temp view data(c1, c2) as values > (1, 2), > (1, 3), > (3, 7), > (4, 5); > cache table data; > select c1, (select count(*) from data d1 where d1.c1 = d2.c1), count(c2) from > data d2 group by all; > {noformat} > It fails with the following error: > {noformat} > [INTERNAL_ERROR] Couldn't find count(1)#163L in > [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000 > org.apache.spark.SparkException: [INTERNAL_ERROR] Couldn't find count(1)#163L > in [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000 > {noformat} > If you don't cache the view, the query succeeds. > Note, in 3.4.2 and 3.5.0 the issue happens only with cached tables, not > cached views. I think that's because cached views were not getting properly > deduplicated in those versions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46779) Grouping by subquery with a cached relation can fail
[ https://issues.apache.org/jira/browse/SPARK-46779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46779. --- Fix Version/s: 3.4.3 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 44806 [https://github.com/apache/spark/pull/44806] > Grouping by subquery with a cached relation can fail > > > Key: SPARK-46779 > URL: https://issues.apache.org/jira/browse/SPARK-46779 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 3.5.0, 4.0.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: pull-request-available > Fix For: 3.4.3, 3.5.1, 4.0.0 > > > Example: > {noformat} > create or replace temp view data(c1, c2) as values > (1, 2), > (1, 3), > (3, 7), > (4, 5); > cache table data; > select c1, (select count(*) from data d1 where d1.c1 = d2.c1), count(c2) from > data d2 group by all; > {noformat} > It fails with the following error: > {noformat} > [INTERNAL_ERROR] Couldn't find count(1)#163L in > [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000 > org.apache.spark.SparkException: [INTERNAL_ERROR] Couldn't find count(1)#163L > in [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000 > {noformat} > If you don't cache the view, the query succeeds. > Note, in 3.4.2 and 3.5.0 the issue happens only with cached tables, not > cached views. I think that's because cached views were not getting properly > deduplicated in those versions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46795) Replace UnsupportedOperationException by SparkUnsupportedOperationException in sql/core
Max Gekk created SPARK-46795: Summary: Replace UnsupportedOperationException by SparkUnsupportedOperationException in sql/core Key: SPARK-46795 URL: https://issues.apache.org/jira/browse/SPARK-46795 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Fix For: 4.0.0 Replace all UnsupportedOperationException by SparkUnsupportedOperationException in Catalyst code base, and introduce new legacy error classes with the _LEGACY_ERROR_TEMP_ prefix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46795) Replace UnsupportedOperationException by SparkUnsupportedOperationException in sql/core
[ https://issues.apache.org/jira/browse/SPARK-46795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-46795: - Description: Replace all UnsupportedOperationException by SparkUnsupportedOperationException in sql/core code base, and introduce new legacy error classes with the _LEGACY_ERROR_TEMP_ prefix. (was: Replace all UnsupportedOperationException by SparkUnsupportedOperationException in Catalyst code base, and introduce new legacy error classes with the _LEGACY_ERROR_TEMP_ prefix.) > Replace UnsupportedOperationException by SparkUnsupportedOperationException > in sql/core > --- > > Key: SPARK-46795 > URL: https://issues.apache.org/jira/browse/SPARK-46795 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Replace all UnsupportedOperationException by > SparkUnsupportedOperationException in sql/core code base, and introduce new > legacy error classes with the _LEGACY_ERROR_TEMP_ prefix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar
[ https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nirav patel updated SPARK-46762: Summary: Spark Connect 3.5 Classloading issue with external jar (was: Spark Connect 3.5 Classloading issue) > Spark Connect 3.5 Classloading issue with external jar > -- > > Key: SPARK-46762 > URL: https://issues.apache.org/jira/browse/SPARK-46762 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: nirav patel >Priority: Major > > We are having following `java.lang.ClassCastException` error in spark > Executors when using spark-connect 3.5 with external spark sql catalog jar - > iceberg-spark-runtime-3.5_2.12-1.4.3.jar > We also set "spark.executor.userClassPathFirst=true" otherwise child class > gets loaded by MutableClassLoader and parent class gets loaded by > ChildFirstCLassLoader and that causes ClassCastException as well. > > {code:java} > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): > java.lang.ClassCastException: class > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to > class org.apache.iceberg.Table > (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed > module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; > org.apache.iceberg.Table is in unnamed module of loader > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943) > at > org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) > at > org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50) > at > org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > at org.apache.spark.scheduler.Task.run(Task.scala:141) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > at org.apach...{code} > > `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of > `org.apache.iceberg.Table` and they are both in only one jar > `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` > We verified that there's only one jar of > `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server > is started. > Looking more into Error it seems classloader itself is instantiated multiple > times somewhere. I can see two instances: > org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 > > *Affected version:* > spark 3.5 and spark-connect_2.12:3.5.0 works fine > > *Not affected version and variation:* > Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar > Also works with just Spark 3.5 spark-submit script directly (ie without using > spark-connect 3.5 ) > > Issue has been open with Iceberg as well: > [https://github.com/apache/iceberg/issues/8978] > And been discussed in dev@org.apache.iceberg: > [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46763) ReplaceDeduplicateWithAggregate fails when non-grouping keys have duplicate attributes
[ https://issues.apache.org/jira/browse/SPARK-46763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46763: --- Labels: pull-request-available (was: ) > ReplaceDeduplicateWithAggregate fails when non-grouping keys have duplicate > attributes > -- > > Key: SPARK-46763 > URL: https://issues.apache.org/jira/browse/SPARK-46763 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.5.0 >Reporter: Nikhil Sheoran >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45404) Support AWS_ENDPOINT_URL env variable
[ https://issues.apache.org/jira/browse/SPARK-45404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809536#comment-17809536 ] Steve Loughran commented on SPARK-45404: Just saw this while working on SPARK-35878. If you are copying endpoints then you may also want to think about picking up the region from AWS_REGION too. The full list of env vars which I have collected by looking in the AWS SDKs is up at https://github.com/steveloughran/cloudstore/blob/main/src/main/java/org/apache/hadoop/fs/store/diag/S3ADiagnosticsInfo.java#L379 I do not know what they all mean or do, only that if I get a support call I want to know if anyone has been setting them. > Support AWS_ENDPOINT_URL env variable > - > > Key: SPARK-45404 > URL: https://issues.apache.org/jira/browse/SPARK-45404 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46794) Incorrect results due to inferred predicate from checkpoint with subquery
Tom van Bussel created SPARK-46794: -- Summary: Incorrect results due to inferred predicate from checkpoint with subquery Key: SPARK-46794 URL: https://issues.apache.org/jira/browse/SPARK-46794 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Tom van Bussel Spark can produce incorrect results when using a checkpointed DataFrame with a filter containing a scalar subquery. This subquery is included in the constraints of the resulting LogicalRDD, and may then be propagated as a filter when joining with the checkpointed DataFrame. This causes the subquery to be evaluated twice: once during checkpointing and once while evaluating the query. These two subquery evaluations may return different results, e.g. when the subquery contains a limit with an underspecified sort order. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46792) Refactor ChannelBuilder system
[ https://issues.apache.org/jira/browse/SPARK-46792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46792: --- Labels: pull-request-available (was: ) > Refactor ChannelBuilder system > -- > > Key: SPARK-46792 > URL: https://issues.apache.org/jira/browse/SPARK-46792 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Alice Sayutina >Priority: Minor > Labels: pull-request-available > > Refactor ChannelBuilder to separate the specific channel builder > implementation from the abstract class for other channel builders -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46793) Revert S3A endpoint fixup logic of SPARK-35878
[ https://issues.apache.org/jira/browse/SPARK-46793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-46793: --- Summary: Revert S3A endpoint fixup logic of SPARK-35878 (was: Revert region fixup logic of SPARK-35878) > Revert S3A endpoint fixup logic of SPARK-35878 > -- > > Key: SPARK-46793 > URL: https://issues.apache.org/jira/browse/SPARK-46793 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.5.0, 3.4.3 >Reporter: Steve Loughran >Priority: Major > > The v2 SDK does its region resolution "differently", and the changes of > SPARK-35878 actually create problems. > That PR went in to fix a regression in Hadoop 3.3.1 which has been fixed > since 3.3.2; removing it is not going to cause problems on anyone not using > the 3.3.1 release, which is 3 years old and replaced by multiple follow on > 3.3.x releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46793) Revert region fixup logic of SPARK-35878
Steve Loughran created SPARK-46793: -- Summary: Revert region fixup logic of SPARK-35878 Key: SPARK-46793 URL: https://issues.apache.org/jira/browse/SPARK-46793 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.5.0, 3.4.3 Reporter: Steve Loughran The v2 SDK does its region resolution "differently", and the changes of SPARK-35878 actually create problems. That PR went in to fix a regression in Hadoop 3.3.1 which has been fixed since 3.3.2; removing it is not going to cause problems on anyone not using the 3.3.1 release, which is 3 years old and replaced by multiple follow on 3.3.x releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46792) Refactor ChannelBuilder system
Alice Sayutina created SPARK-46792: -- Summary: Refactor ChannelBuilder system Key: SPARK-46792 URL: https://issues.apache.org/jira/browse/SPARK-46792 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Alice Sayutina Refactor ChannelBuilder to separate the specific channel builder implementation from the abstract class for other channel builders -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46791) Support Java `Set` in JavaTypeInference
[ https://issues.apache.org/jira/browse/SPARK-46791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46791. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44828 [https://github.com/apache/spark/pull/44828] > Support Java `Set` in JavaTypeInference > --- > > Key: SPARK-46791 > URL: https://issues.apache.org/jira/browse/SPARK-46791 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Scala Set (scala.collection.Set) is supported in ScalaReflection so users can > encode Scala Set in Dataset. But Java Set is not supported in bean encoder > (i.e., JavaTypeInference). This feature inconsistency makes Java users cannot > use Set like Scala users do. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46777) Refactor `StreamingDataSourceRelation` into `StreamingDataSourceRelation` and `StreamingDataSourceScanRelation` for parity with batch scan
[ https://issues.apache.org/jira/browse/SPARK-46777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-46777. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44818 [https://github.com/apache/spark/pull/44818] > Refactor `StreamingDataSourceRelation` into `StreamingDataSourceRelation` and > `StreamingDataSourceScanRelation` for parity with batch scan > -- > > Key: SPARK-46777 > URL: https://issues.apache.org/jira/browse/SPARK-46777 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jackie Zhang >Assignee: Jackie Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > To prepare for the incoming structured streaming operator pushdown, we'd like > to refactor some catalyst object relationships first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46777) Refactor `StreamingDataSourceRelation` into `StreamingDataSourceRelation` and `StreamingDataSourceScanRelation` for parity with batch scan
[ https://issues.apache.org/jira/browse/SPARK-46777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-46777: Assignee: Jackie Zhang > Refactor `StreamingDataSourceRelation` into `StreamingDataSourceRelation` and > `StreamingDataSourceScanRelation` for parity with batch scan > -- > > Key: SPARK-46777 > URL: https://issues.apache.org/jira/browse/SPARK-46777 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jackie Zhang >Assignee: Jackie Zhang >Priority: Major > Labels: pull-request-available > > To prepare for the incoming structured streaming operator pushdown, we'd like > to refactor some catalyst object relationships first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46789) Add `VolumeSuite` to K8s IT
[ https://issues.apache.org/jira/browse/SPARK-46789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46789: -- Fix Version/s: 3.5.1 > Add `VolumeSuite` to K8s IT > --- > > Key: SPARK-46789 > URL: https://issues.apache.org/jira/browse/SPARK-46789 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46738) `Cast` displayed different results between Regular Spark and Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-46738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46738: --- Labels: pull-request-available (was: ) > `Cast` displayed different results between Regular Spark and Spark Connect > -- > > Key: SPARK-46738 > URL: https://issues.apache.org/jira/browse/SPARK-46738 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > Attachments: screenshot-1.png > > > The following doctest will throw an error in the tests of the pyspark-connect > module > {code:java} > Example 5: Decrypt data with key. > >>> import pyspark.sql.functions as sf > >>> df = spark.createDataFrame([( > ... "83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94", > ... "",)], > ... ["input", "key"] > ... ) > >>> df.select(sf.try_aes_decrypt( > ... sf.unhex(df.input), df.key > ... ).cast("STRING")).show(truncate=False) # doctest: +SKIP > +--+ > |CAST(try_aes_decrypt(unhex(input), key, GCM, DEFAULT, ) AS STRING)| > +--+ > |Spark | > +--+ {code} > {code:java} > df.select(sf.try_aes_decrypt( > 4170sf.unhex(df.input), df.key > 4171).cast("STRING")).show(truncate=False) > 4172Expected: > 4173+--+ > 4174|CAST(try_aes_decrypt(unhex(input), key, GCM, DEFAULT, ) AS STRING)| > 4175+--+ > 4176|Spark | > 4177+--+ > 4178Got: > 4179+--+ > 4180|try_aes_decrypt(unhex(input), key, GCM, DEFAULT, )| > 4181+--+ > 4182|Spark | > 4183+--+{code} > !screenshot-1.png|width=671,height=222! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46738) `Cast` displayed different results between Regular Spark and Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-46738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-46738: Summary: `Cast` displayed different results between Regular Spark and Spark Connect (was: `Cast` of pyspark displayed different results between Regular Spark and Spark Connect) > `Cast` displayed different results between Regular Spark and Spark Connect > -- > > Key: SPARK-46738 > URL: https://issues.apache.org/jira/browse/SPARK-46738 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Attachments: screenshot-1.png > > > The following doctest will throw an error in the tests of the pyspark-connect > module > {code:java} > Example 5: Decrypt data with key. > >>> import pyspark.sql.functions as sf > >>> df = spark.createDataFrame([( > ... "83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94", > ... "",)], > ... ["input", "key"] > ... ) > >>> df.select(sf.try_aes_decrypt( > ... sf.unhex(df.input), df.key > ... ).cast("STRING")).show(truncate=False) # doctest: +SKIP > +--+ > |CAST(try_aes_decrypt(unhex(input), key, GCM, DEFAULT, ) AS STRING)| > +--+ > |Spark | > +--+ {code} > {code:java} > df.select(sf.try_aes_decrypt( > 4170sf.unhex(df.input), df.key > 4171).cast("STRING")).show(truncate=False) > 4172Expected: > 4173+--+ > 4174|CAST(try_aes_decrypt(unhex(input), key, GCM, DEFAULT, ) AS STRING)| > 4175+--+ > 4176|Spark | > 4177+--+ > 4178Got: > 4179+--+ > 4180|try_aes_decrypt(unhex(input), key, GCM, DEFAULT, )| > 4181+--+ > 4182|Spark | > 4183+--+{code} > !screenshot-1.png|width=671,height=222! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46718) Upgrade Arrow to 15.0.0
[ https://issues.apache.org/jira/browse/SPARK-46718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46718: -- Assignee: (was: Apache Spark) > Upgrade Arrow to 15.0.0 > --- > > Key: SPARK-46718 > URL: https://issues.apache.org/jira/browse/SPARK-46718 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > Attachments: image-2024-01-15-14-02-57-814.png > > > https://github.com/apache/arrow/releases/tag/apache-arrow-15.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46718) Upgrade Arrow to 15.0.0
[ https://issues.apache.org/jira/browse/SPARK-46718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46718: -- Assignee: Apache Spark > Upgrade Arrow to 15.0.0 > --- > > Key: SPARK-46718 > URL: https://issues.apache.org/jira/browse/SPARK-46718 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > Attachments: image-2024-01-15-14-02-57-814.png > > > https://github.com/apache/arrow/releases/tag/apache-arrow-15.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46789) Add `VolumeSuite` to K8s IT
[ https://issues.apache.org/jira/browse/SPARK-46789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-46789: Assignee: Dongjoon Hyun > Add `VolumeSuite` to K8s IT > --- > > Key: SPARK-46789 > URL: https://issues.apache.org/jira/browse/SPARK-46789 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46789) Add `VolumeSuite` to K8s IT
[ https://issues.apache.org/jira/browse/SPARK-46789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-46789. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44827 [https://github.com/apache/spark/pull/44827] > Add `VolumeSuite` to K8s IT > --- > > Key: SPARK-46789 > URL: https://issues.apache.org/jira/browse/SPARK-46789 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46673) Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt`
[ https://issues.apache.org/jira/browse/SPARK-46673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-46673: Assignee: BingKun Pan > Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt` > -- > > Key: SPARK-46673 > URL: https://issues.apache.org/jira/browse/SPARK-46673 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46673) Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt`
[ https://issues.apache.org/jira/browse/SPARK-46673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-46673. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44750 [https://github.com/apache/spark/pull/44750] > Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt` > -- > > Key: SPARK-46673 > URL: https://issues.apache.org/jira/browse/SPARK-46673 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46791) Support Java `Set` in JavaTypeInference
[ https://issues.apache.org/jira/browse/SPARK-46791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46791: --- Labels: pull-request-available (was: ) > Support Java `Set` in JavaTypeInference > --- > > Key: SPARK-46791 > URL: https://issues.apache.org/jira/browse/SPARK-46791 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > > Scala Set (scala.collection.Set) is supported in ScalaReflection so users can > encode Scala Set in Dataset. But Java Set is not supported in bean encoder > (i.e., JavaTypeInference). This feature inconsistency makes Java users cannot > use Set like Scala users do. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46791) Support Java `Set` in JavaTypeInference
L. C. Hsieh created SPARK-46791: --- Summary: Support Java `Set` in JavaTypeInference Key: SPARK-46791 URL: https://issues.apache.org/jira/browse/SPARK-46791 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh Scala Set (scala.collection.Set) is supported in ScalaReflection so users can encode Scala Set in Dataset. But Java Set is not supported in bean encoder (i.e., JavaTypeInference). This feature inconsistency makes Java users cannot use Set like Scala users do. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org