[jira] [Assigned] (SPARK-42667) Spark Connect: newSession API
[ https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42667: Assignee: Rui Wang (was: Apache Spark) > Spark Connect: newSession API > - > > Key: SPARK-42667 > URL: https://issues.apache.org/jira/browse/SPARK-42667 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.1 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42667) Spark Connect: newSession API
[ https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42667: Assignee: Apache Spark (was: Rui Wang) > Spark Connect: newSession API > - > > Key: SPARK-42667 > URL: https://issues.apache.org/jira/browse/SPARK-42667 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.1 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.
[ https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696317#comment-17696317 ] Apache Spark commented on SPARK-42662: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40270 > Support `withSequenceColumn` as PySpark DataFrame internal function. > > > Key: SPARK-42662 > URL: https://issues.apache.org/jira/browse/SPARK-42662 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Turn `withSequenceColumn` into PySpark internal API to support the > distributed-sequence index of the pandas API on Spark in Spark Connect as > well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.
[ https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42662: Assignee: (was: Apache Spark) > Support `withSequenceColumn` as PySpark DataFrame internal function. > > > Key: SPARK-42662 > URL: https://issues.apache.org/jira/browse/SPARK-42662 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Turn `withSequenceColumn` into PySpark internal API to support the > distributed-sequence index of the pandas API on Spark in Spark Connect as > well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.
[ https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696314#comment-17696314 ] Apache Spark commented on SPARK-42662: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40270 > Support `withSequenceColumn` as PySpark DataFrame internal function. > > > Key: SPARK-42662 > URL: https://issues.apache.org/jira/browse/SPARK-42662 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Turn `withSequenceColumn` into PySpark internal API to support the > distributed-sequence index of the pandas API on Spark in Spark Connect as > well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.
[ https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42662: Assignee: Apache Spark > Support `withSequenceColumn` as PySpark DataFrame internal function. > > > Key: SPARK-42662 > URL: https://issues.apache.org/jira/browse/SPARK-42662 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > Turn `withSequenceColumn` into PySpark internal API to support the > distributed-sequence index of the pandas API on Spark in Spark Connect as > well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42258) pyspark.sql.functions should not expose typing.cast
[ https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42258: Assignee: Apache Spark > pyspark.sql.functions should not expose typing.cast > --- > > Key: SPARK-42258 > URL: https://issues.apache.org/jira/browse/SPARK-42258 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.1 >Reporter: Furcy Pin >Assignee: Apache Spark >Priority: Minor > > In pyspark, the `pyspark.sql.functions` modules imports and exposes the > method `typing.cast`. > This may lead to errors from users that can be hard to spot. > *Example* > It took me a few minutes to understand why the following code: > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql import functions as f > spark = SparkSession.builder.getOrCreate() > df = spark.sql("""SELECT 1 as a""") > df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema() {code} > which executes without any problem, gives the following result: > > > {code:java} > root > |-- a: integer (nullable = false){code} > This is because `f.cast` here calls `typing.cast, and the correct syntax is: > {code:java} > df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code} > > which indeed gives: > {code:java} > root > |-- a: string (nullable = false) {code} > *Suggestion of solution* > Option 1: The methods imported in the module `pyspark.sql.functions` could be > obfuscated to prevent this. For instance: > {code:java} > from typing import cast as _cast{code} > Option 2: only import `typing` and replace all occurrences of `cast` with > `typing.cast` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42258) pyspark.sql.functions should not expose typing.cast
[ https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42258: Assignee: (was: Apache Spark) > pyspark.sql.functions should not expose typing.cast > --- > > Key: SPARK-42258 > URL: https://issues.apache.org/jira/browse/SPARK-42258 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.1 >Reporter: Furcy Pin >Priority: Minor > > In pyspark, the `pyspark.sql.functions` modules imports and exposes the > method `typing.cast`. > This may lead to errors from users that can be hard to spot. > *Example* > It took me a few minutes to understand why the following code: > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql import functions as f > spark = SparkSession.builder.getOrCreate() > df = spark.sql("""SELECT 1 as a""") > df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema() {code} > which executes without any problem, gives the following result: > > > {code:java} > root > |-- a: integer (nullable = false){code} > This is because `f.cast` here calls `typing.cast, and the correct syntax is: > {code:java} > df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code} > > which indeed gives: > {code:java} > root > |-- a: string (nullable = false) {code} > *Suggestion of solution* > Option 1: The methods imported in the module `pyspark.sql.functions` could be > obfuscated to prevent this. For instance: > {code:java} > from typing import cast as _cast{code} > Option 2: only import `typing` and replace all occurrences of `cast` with > `typing.cast` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42258) pyspark.sql.functions should not expose typing.cast
[ https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696237#comment-17696237 ] Apache Spark commented on SPARK-42258: -- User 'FurcyPin' has created a pull request for this issue: https://github.com/apache/spark/pull/40271 > pyspark.sql.functions should not expose typing.cast > --- > > Key: SPARK-42258 > URL: https://issues.apache.org/jira/browse/SPARK-42258 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.1 >Reporter: Furcy Pin >Priority: Minor > > In pyspark, the `pyspark.sql.functions` modules imports and exposes the > method `typing.cast`. > This may lead to errors from users that can be hard to spot. > *Example* > It took me a few minutes to understand why the following code: > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql import functions as f > spark = SparkSession.builder.getOrCreate() > df = spark.sql("""SELECT 1 as a""") > df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema() {code} > which executes without any problem, gives the following result: > > > {code:java} > root > |-- a: integer (nullable = false){code} > This is because `f.cast` here calls `typing.cast, and the correct syntax is: > {code:java} > df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code} > > which indeed gives: > {code:java} > root > |-- a: string (nullable = false) {code} > *Suggestion of solution* > Option 1: The methods imported in the module `pyspark.sql.functions` could be > obfuscated to prevent this. For instance: > {code:java} > from typing import cast as _cast{code} > Option 2: only import `typing` and replace all occurrences of `cast` with > `typing.cast` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42497) Support of pandas API on Spark for Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-42497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42497: Assignee: (was: Apache Spark) > Support of pandas API on Spark for Spark Connect. > - > > Key: SPARK-42497 > URL: https://issues.apache.org/jira/browse/SPARK-42497 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > We should enable `pandas API on Spark` on Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42497) Support of pandas API on Spark for Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-42497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696206#comment-17696206 ] Apache Spark commented on SPARK-42497: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40270 > Support of pandas API on Spark for Spark Connect. > - > > Key: SPARK-42497 > URL: https://issues.apache.org/jira/browse/SPARK-42497 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > We should enable `pandas API on Spark` on Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42497) Support of pandas API on Spark for Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-42497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42497: Assignee: Apache Spark > Support of pandas API on Spark for Spark Connect. > - > > Key: SPARK-42497 > URL: https://issues.apache.org/jira/browse/SPARK-42497 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > We should enable `pandas API on Spark` on Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42500) ConstantPropagation support more cases
[ https://issues.apache.org/jira/browse/SPARK-42500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696177#comment-17696177 ] Apache Spark commented on SPARK-42500: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/40268 > ConstantPropagation support more cases > -- > > Key: SPARK-42500 > URL: https://issues.apache.org/jira/browse/SPARK-42500 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42500) ConstantPropagation support more cases
[ https://issues.apache.org/jira/browse/SPARK-42500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696176#comment-17696176 ] Apache Spark commented on SPARK-42500: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/40268 > ConstantPropagation support more cases > -- > > Key: SPARK-42500 > URL: https://issues.apache.org/jira/browse/SPARK-42500 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42653) Artifact transfer from Scala/JVM client to Server
[ https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696144#comment-17696144 ] Apache Spark commented on SPARK-42653: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/40267 > Artifact transfer from Scala/JVM client to Server > - > > Key: SPARK-42653 > URL: https://issues.apache.org/jira/browse/SPARK-42653 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Venkata Sai Akhil Gudesa >Priority: Major > Fix For: 3.4.1 > > > In the decoupled client-server architecture of Spark Connect, a remote client > may use a local JAR or a new class in their UDF that may not be present on > the server. To handle these cases of missing "artifacts", we need to > implement a mechanism to transfer artifacts from the client side over to the > server side as per the protocol defined in > https://github.com/apache/spark/pull/40147 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42660) Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)
[ https://issues.apache.org/jira/browse/SPARK-42660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696059#comment-17696059 ] Apache Spark commented on SPARK-42660: -- User 'mskapilks' has created a pull request for this issue: https://github.com/apache/spark/pull/40266 > Infer filters for Join produced by IN and EXISTS clause > (RewritePredicateSubquery rule) > --- > > Key: SPARK-42660 > URL: https://issues.apache.org/jira/browse/SPARK-42660 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: Kapil Singh >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42660) Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)
[ https://issues.apache.org/jira/browse/SPARK-42660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42660: Assignee: Apache Spark > Infer filters for Join produced by IN and EXISTS clause > (RewritePredicateSubquery rule) > --- > > Key: SPARK-42660 > URL: https://issues.apache.org/jira/browse/SPARK-42660 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: Kapil Singh >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42660) Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)
[ https://issues.apache.org/jira/browse/SPARK-42660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42660: Assignee: (was: Apache Spark) > Infer filters for Join produced by IN and EXISTS clause > (RewritePredicateSubquery rule) > --- > > Key: SPARK-42660 > URL: https://issues.apache.org/jira/browse/SPARK-42660 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: Kapil Singh >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42556) Dataset.colregex should link a plan_id when it only matches a single column.
[ https://issues.apache.org/jira/browse/SPARK-42556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42556: Assignee: Apache Spark > Dataset.colregex should link a plan_id when it only matches a single column. > > > Key: SPARK-42556 > URL: https://issues.apache.org/jira/browse/SPARK-42556 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > > When colregex returns a single column it should link the plans plan_id. For > reference here is the non-connect Dataset code that does this: > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512] > This also needs to be fixed for the Python client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42556) Dataset.colregex should link a plan_id when it only matches a single column.
[ https://issues.apache.org/jira/browse/SPARK-42556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42556: Assignee: (was: Apache Spark) > Dataset.colregex should link a plan_id when it only matches a single column. > > > Key: SPARK-42556 > URL: https://issues.apache.org/jira/browse/SPARK-42556 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > When colregex returns a single column it should link the plans plan_id. For > reference here is the non-connect Dataset code that does this: > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512] > This also needs to be fixed for the Python client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42556) Dataset.colregex should link a plan_id when it only matches a single column.
[ https://issues.apache.org/jira/browse/SPARK-42556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696046#comment-17696046 ] Apache Spark commented on SPARK-42556: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40265 > Dataset.colregex should link a plan_id when it only matches a single column. > > > Key: SPARK-42556 > URL: https://issues.apache.org/jira/browse/SPARK-42556 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > When colregex returns a single column it should link the plans plan_id. For > reference here is the non-connect Dataset code that does this: > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512] > This also needs to be fixed for the Python client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42635) Several counter-intuitive behaviours in the TimestampAdd expression
[ https://issues.apache.org/jira/browse/SPARK-42635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696042#comment-17696042 ] Apache Spark commented on SPARK-42635: -- User 'chenhao-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40264 > Several counter-intuitive behaviours in the TimestampAdd expression > --- > > Key: SPARK-42635 > URL: https://issues.apache.org/jira/browse/SPARK-42635 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Fix For: 3.4.1 > > > # When the time is close to daylight saving time transition, the result may > be discontinuous and not monotonic. > We currently have: > {code:scala} > scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > scala> spark.sql("select timestampadd(second, 24 * 3600 - 1, > timestamp'2011-03-12 03:00:00')").show > ++ > |timestampadd(second, ((24 * 3600) - 1), TIMESTAMP '2011-03-12 03:00:00')| > ++ > | 2011-03-13 03:59:59| > ++ > scala> spark.sql("select timestampadd(second, 24 * 3600, timestamp'2011-03-12 > 03:00:00')").show > +--+ > |timestampadd(second, (24 * 3600), TIMESTAMP '2011-03-12 03:00:00')| > +--+ > | 2011-03-13 03:00:00| > +--+ {code} > > In the second query, adding one more second will set the time back one hour > instead. Plus, there are only {{23 * 3600}} seconds from {{2011-03-12 > 03:00:00}} to {{2011-03-13 03:00:00}}, instead of {{24 * 3600}} seconds, due > to the daylight saving time transition. > The root cause of the problem is the Spark code at > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L790] > wrongly assumes every day has {{MICROS_PER_DAY}} seconds, and does the day > and time-in-day split before looking at the timezone. > 2. Adding month, quarter, and year silently ignores Int overflow during unit > conversion. > The root cause is > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1246]. > {{quantity}} is multiplied by {{3}} or {{MONTHS_PER_YEAR}} without checking > overflow. Note that we do have overflow checking in adding the amount to the > timestamp, so the behavior is inconsistent. > This can cause counter-intuitive results like this: > {code:scala} > scala> spark.sql("select timestampadd(quarter, 1431655764, > timestamp'1970-01-01')").show > +--+ > |timestampadd(quarter, 1431655764, TIMESTAMP '1970-01-01 00:00:00')| > +--+ > | 1969-09-01 00:00:00| > +--+{code} > 3. Adding sub-month units (week, day, hour, minute, second, millisecond, > microsecond)silently ignores Long overflow during unit conversion. > This is similar to the previous problem: > {code:scala} > scala> spark.sql("select timestampadd(day, 106751992, > timestamp'1970-01-01')").show(false) > +-+ > |timestampadd(day, 106751992, TIMESTAMP '1970-01-01 00:00:00')| > +-+ > |-290308-12-22 15:58:10.448384| > +-+{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42659) Reimplement `FPGrowthModel.transform` with dataframe operations
[ https://issues.apache.org/jira/browse/SPARK-42659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42659: Assignee: (was: Apache Spark) > Reimplement `FPGrowthModel.transform` with dataframe operations > --- > > Key: SPARK-42659 > URL: https://issues.apache.org/jira/browse/SPARK-42659 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42659) Reimplement `FPGrowthModel.transform` with dataframe operations
[ https://issues.apache.org/jira/browse/SPARK-42659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42659: Assignee: Apache Spark > Reimplement `FPGrowthModel.transform` with dataframe operations > --- > > Key: SPARK-42659 > URL: https://issues.apache.org/jira/browse/SPARK-42659 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42659) Reimplement `FPGrowthModel.transform` with dataframe operations
[ https://issues.apache.org/jira/browse/SPARK-42659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696019#comment-17696019 ] Apache Spark commented on SPARK-42659: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40263 > Reimplement `FPGrowthModel.transform` with dataframe operations > --- > > Key: SPARK-42659 > URL: https://issues.apache.org/jira/browse/SPARK-42659 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42659) Reimplement `FPGrowthModel.transform` with dataframe operations
[ https://issues.apache.org/jira/browse/SPARK-42659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696018#comment-17696018 ] Apache Spark commented on SPARK-42659: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40263 > Reimplement `FPGrowthModel.transform` with dataframe operations > --- > > Key: SPARK-42659 > URL: https://issues.apache.org/jira/browse/SPARK-42659 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42651) Optimize global sort to driver sort
[ https://issues.apache.org/jira/browse/SPARK-42651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42651: Assignee: (was: Apache Spark) > Optimize global sort to driver sort > --- > > Key: SPARK-42651 > URL: https://issues.apache.org/jira/browse/SPARK-42651 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > > If the size of plan is small enough, it's more efficient to sort all rows at > driver side that saves one shuffle -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42651) Optimize global sort to driver sort
[ https://issues.apache.org/jira/browse/SPARK-42651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695991#comment-17695991 ] Apache Spark commented on SPARK-42651: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/40262 > Optimize global sort to driver sort > --- > > Key: SPARK-42651 > URL: https://issues.apache.org/jira/browse/SPARK-42651 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > > If the size of plan is small enough, it's more efficient to sort all rows at > driver side that saves one shuffle -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42651) Optimize global sort to driver sort
[ https://issues.apache.org/jira/browse/SPARK-42651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42651: Assignee: Apache Spark > Optimize global sort to driver sort > --- > > Key: SPARK-42651 > URL: https://issues.apache.org/jira/browse/SPARK-42651 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > If the size of plan is small enough, it's more efficient to sort all rows at > driver side that saves one shuffle -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42651) Optimize global sort to driver sort
[ https://issues.apache.org/jira/browse/SPARK-42651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695992#comment-17695992 ] Apache Spark commented on SPARK-42651: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/40262 > Optimize global sort to driver sort > --- > > Key: SPARK-42651 > URL: https://issues.apache.org/jira/browse/SPARK-42651 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > > If the size of plan is small enough, it's more efficient to sort all rows at > driver side that saves one shuffle -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`
[ https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695943#comment-17695943 ] Apache Spark commented on SPARK-42615: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40261 > Refactor the AnalyzePlan RPC and add `session.version` > -- > > Key: SPARK-42615 > URL: https://issues.apache.org/jira/browse/SPARK-42615 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`
[ https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695928#comment-17695928 ] Apache Spark commented on SPARK-42630: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40260 > Make `parse_data_type` use new proto message `DDLParse` > --- > > Key: SPARK-42630 > URL: https://issues.apache.org/jira/browse/SPARK-42630 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`
[ https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695927#comment-17695927 ] Apache Spark commented on SPARK-42630: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40260 > Make `parse_data_type` use new proto message `DDLParse` > --- > > Key: SPARK-42630 > URL: https://issues.apache.org/jira/browse/SPARK-42630 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42609) Add tests for grouping() and grouping_id() functions
[ https://issues.apache.org/jira/browse/SPARK-42609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42609: Assignee: Apache Spark (was: Rui Wang) > Add tests for grouping() and grouping_id() functions > > > Key: SPARK-42609 > URL: https://issues.apache.org/jira/browse/SPARK-42609 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42609) Add tests for grouping() and grouping_id() functions
[ https://issues.apache.org/jira/browse/SPARK-42609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42609: Assignee: Rui Wang (was: Apache Spark) > Add tests for grouping() and grouping_id() functions > > > Key: SPARK-42609 > URL: https://issues.apache.org/jira/browse/SPARK-42609 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42609) Add tests for grouping() and grouping_id() functions
[ https://issues.apache.org/jira/browse/SPARK-42609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695912#comment-17695912 ] Apache Spark commented on SPARK-42609: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40259 > Add tests for grouping() and grouping_id() functions > > > Key: SPARK-42609 > URL: https://issues.apache.org/jira/browse/SPARK-42609 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42655) Incorrect ambiguous column reference error
[ https://issues.apache.org/jira/browse/SPARK-42655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42655: Assignee: Apache Spark > Incorrect ambiguous column reference error > -- > > Key: SPARK-42655 > URL: https://issues.apache.org/jira/browse/SPARK-42655 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Shrikant Prasad >Assignee: Apache Spark >Priority: Major > > val df1 = > sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", > "col5") > val op_cols_same_case = List("id","col2","col3","col4", "col5", "id") > val df2 = df1.select(op_cols_same_case.head, op_cols_same_case.tail: _*) > df2.select("id").show() > > This query runs fine. > > But when we change the casing of the op_cols to have mix of upper & lower > case ("id" & "ID") it throws an ambiguous col ref error: > > val df1 = > sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", > "col5") > val op_cols_mixed_case = List("id","col2","col3","col4", "col5", "ID") > val df3 = df1.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*) > df3.select("id").show() > org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could > be: id, id. > at > org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:363) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:112) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpressionByPlanChildren$1(Analyzer.scala:1857) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpression$2(Analyzer.scala:1787) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:60) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.innerResolve$1(Analyzer.scala:1794) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpression(Analyzer.scala:1812) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpressionByPlanChildren(Analyzer.scala:1863) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$17.$anonfun$applyOrElse$94(Analyzer.scala:1577) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209) > > Since, Spark is case insensitive, it should work for second case also when we > have upper and lower case column names in the column list. > It also works fine in Spark 2.3. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42655) Incorrect ambiguous column reference error
[ https://issues.apache.org/jira/browse/SPARK-42655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695892#comment-17695892 ] Apache Spark commented on SPARK-42655: -- User 'shrprasa' has created a pull request for this issue: https://github.com/apache/spark/pull/40258 > Incorrect ambiguous column reference error > -- > > Key: SPARK-42655 > URL: https://issues.apache.org/jira/browse/SPARK-42655 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Shrikant Prasad >Priority: Major > > val df1 = > sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", > "col5") > val op_cols_same_case = List("id","col2","col3","col4", "col5", "id") > val df2 = df1.select(op_cols_same_case.head, op_cols_same_case.tail: _*) > df2.select("id").show() > > This query runs fine. > > But when we change the casing of the op_cols to have mix of upper & lower > case ("id" & "ID") it throws an ambiguous col ref error: > > val df1 = > sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", > "col5") > val op_cols_mixed_case = List("id","col2","col3","col4", "col5", "ID") > val df3 = df1.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*) > df3.select("id").show() > org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could > be: id, id. > at > org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:363) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:112) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpressionByPlanChildren$1(Analyzer.scala:1857) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpression$2(Analyzer.scala:1787) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:60) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.innerResolve$1(Analyzer.scala:1794) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpression(Analyzer.scala:1812) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpressionByPlanChildren(Analyzer.scala:1863) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$17.$anonfun$applyOrElse$94(Analyzer.scala:1577) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209) > > Since, Spark is case insensitive, it should work for second case also when we > have upper and lower case column names in the column list. > It also works fine in Spark 2.3. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42655) Incorrect ambiguous column reference error
[ https://issues.apache.org/jira/browse/SPARK-42655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42655: Assignee: (was: Apache Spark) > Incorrect ambiguous column reference error > -- > > Key: SPARK-42655 > URL: https://issues.apache.org/jira/browse/SPARK-42655 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Shrikant Prasad >Priority: Major > > val df1 = > sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", > "col5") > val op_cols_same_case = List("id","col2","col3","col4", "col5", "id") > val df2 = df1.select(op_cols_same_case.head, op_cols_same_case.tail: _*) > df2.select("id").show() > > This query runs fine. > > But when we change the casing of the op_cols to have mix of upper & lower > case ("id" & "ID") it throws an ambiguous col ref error: > > val df1 = > sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", > "col5") > val op_cols_mixed_case = List("id","col2","col3","col4", "col5", "ID") > val df3 = df1.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*) > df3.select("id").show() > org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could > be: id, id. > at > org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:363) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:112) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpressionByPlanChildren$1(Analyzer.scala:1857) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpression$2(Analyzer.scala:1787) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:60) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.innerResolve$1(Analyzer.scala:1794) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpression(Analyzer.scala:1812) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpressionByPlanChildren(Analyzer.scala:1863) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$17.$anonfun$applyOrElse$94(Analyzer.scala:1577) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209) > > Since, Spark is case insensitive, it should work for second case also when we > have upper and lower case column names in the column list. > It also works fine in Spark 2.3. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42656) Spark Connect Scala Client Shell Script
[ https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42656: Assignee: (was: Apache Spark) > Spark Connect Scala Client Shell Script > --- > > Key: SPARK-42656 > URL: https://issues.apache.org/jira/browse/SPARK-42656 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Adding a shell script to run scala client in a scala REPL to allow users to > connect to spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42656) Spark Connect Scala Client Shell Script
[ https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42656: Assignee: Apache Spark > Spark Connect Scala Client Shell Script > --- > > Key: SPARK-42656 > URL: https://issues.apache.org/jira/browse/SPARK-42656 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Apache Spark >Priority: Major > > Adding a shell script to run scala client in a scala REPL to allow users to > connect to spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42656) Spark Connect Scala Client Shell Script
[ https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695879#comment-17695879 ] Apache Spark commented on SPARK-42656: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40257 > Spark Connect Scala Client Shell Script > --- > > Key: SPARK-42656 > URL: https://issues.apache.org/jira/browse/SPARK-42656 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Adding a shell script to run scala client in a scala REPL to allow users to > connect to spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42653) Artifact transfer from Scala/JVM client to Server
[ https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695820#comment-17695820 ] Apache Spark commented on SPARK-42653: -- User 'vicennial' has created a pull request for this issue: https://github.com/apache/spark/pull/40256 > Artifact transfer from Scala/JVM client to Server > - > > Key: SPARK-42653 > URL: https://issues.apache.org/jira/browse/SPARK-42653 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > In the decoupled client-server architecture of Spark Connect, a remote client > may use a local JAR or a new class in their UDF that may not be present on > the server. To handle these cases of missing "artifacts", we need to > implement a mechanism to transfer artifacts from the client side over to the > server side as per the protocol defined in > https://github.com/apache/spark/pull/40147 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42653) Artifact transfer from Scala/JVM client to Server
[ https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42653: Assignee: (was: Apache Spark) > Artifact transfer from Scala/JVM client to Server > - > > Key: SPARK-42653 > URL: https://issues.apache.org/jira/browse/SPARK-42653 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > In the decoupled client-server architecture of Spark Connect, a remote client > may use a local JAR or a new class in their UDF that may not be present on > the server. To handle these cases of missing "artifacts", we need to > implement a mechanism to transfer artifacts from the client side over to the > server side as per the protocol defined in > https://github.com/apache/spark/pull/40147 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42653) Artifact transfer from Scala/JVM client to Server
[ https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42653: Assignee: Apache Spark > Artifact transfer from Scala/JVM client to Server > - > > Key: SPARK-42653 > URL: https://issues.apache.org/jira/browse/SPARK-42653 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Apache Spark >Priority: Major > > In the decoupled client-server architecture of Spark Connect, a remote client > may use a local JAR or a new class in their UDF that may not be present on > the server. To handle these cases of missing "artifacts", we need to > implement a mechanism to transfer artifacts from the client side over to the > server side as per the protocol defined in > https://github.com/apache/spark/pull/40147 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42653) Artifact transfer from Scala/JVM client to Server
[ https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695819#comment-17695819 ] Apache Spark commented on SPARK-42653: -- User 'vicennial' has created a pull request for this issue: https://github.com/apache/spark/pull/40256 > Artifact transfer from Scala/JVM client to Server > - > > Key: SPARK-42653 > URL: https://issues.apache.org/jira/browse/SPARK-42653 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > In the decoupled client-server architecture of Spark Connect, a remote client > may use a local JAR or a new class in their UDF that may not be present on > the server. To handle these cases of missing "artifacts", we need to > implement a mechanism to transfer artifacts from the client side over to the > server side as per the protocol defined in > https://github.com/apache/spark/pull/40147 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42654) Upgrade dropwizard metrics 4.2.17
[ https://issues.apache.org/jira/browse/SPARK-42654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42654: Assignee: Apache Spark > Upgrade dropwizard metrics 4.2.17 > - > > Key: SPARK-42654 > URL: https://issues.apache.org/jira/browse/SPARK-42654 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > * [https://github.com/dropwizard/metrics/releases/tag/v4.2.16] > * [https://github.com/dropwizard/metrics/releases/tag/v4.2.17] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42558) Implement DataFrameStatFunctions
[ https://issues.apache.org/jira/browse/SPARK-42558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42558: Assignee: Apache Spark > Implement DataFrameStatFunctions > > > Key: SPARK-42558 > URL: https://issues.apache.org/jira/browse/SPARK-42558 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > > Implement DataFrameStatFunctions for connect, and hook it up to Dataset. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42558) Implement DataFrameStatFunctions
[ https://issues.apache.org/jira/browse/SPARK-42558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695816#comment-17695816 ] Apache Spark commented on SPARK-42558: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40255 > Implement DataFrameStatFunctions > > > Key: SPARK-42558 > URL: https://issues.apache.org/jira/browse/SPARK-42558 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Implement DataFrameStatFunctions for connect, and hook it up to Dataset. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42654) Upgrade dropwizard metrics 4.2.17
[ https://issues.apache.org/jira/browse/SPARK-42654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695814#comment-17695814 ] Apache Spark commented on SPARK-42654: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40254 > Upgrade dropwizard metrics 4.2.17 > - > > Key: SPARK-42654 > URL: https://issues.apache.org/jira/browse/SPARK-42654 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/dropwizard/metrics/releases/tag/v4.2.16] > * [https://github.com/dropwizard/metrics/releases/tag/v4.2.17] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42558) Implement DataFrameStatFunctions
[ https://issues.apache.org/jira/browse/SPARK-42558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42558: Assignee: (was: Apache Spark) > Implement DataFrameStatFunctions > > > Key: SPARK-42558 > URL: https://issues.apache.org/jira/browse/SPARK-42558 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Implement DataFrameStatFunctions for connect, and hook it up to Dataset. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42654) Upgrade dropwizard metrics 4.2.17
[ https://issues.apache.org/jira/browse/SPARK-42654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42654: Assignee: (was: Apache Spark) > Upgrade dropwizard metrics 4.2.17 > - > > Key: SPARK-42654 > URL: https://issues.apache.org/jira/browse/SPARK-42654 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/dropwizard/metrics/releases/tag/v4.2.16] > * [https://github.com/dropwizard/metrics/releases/tag/v4.2.17] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42553) NonReserved keyword "interval" can't be column name
[ https://issues.apache.org/jira/browse/SPARK-42553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695696#comment-17695696 ] Apache Spark commented on SPARK-42553: -- User 'jiang13021' has created a pull request for this issue: https://github.com/apache/spark/pull/40253 > NonReserved keyword "interval" can't be column name > --- > > Key: SPARK-42553 > URL: https://issues.apache.org/jira/browse/SPARK-42553 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.2.3, 3.3.2 > Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_345) > Spark version 3.2.3-SNAPSHOT >Reporter: jiang13021 >Assignee: jiang13021 >Priority: Major > Fix For: 3.4.1 > > > INTERVAL is a Non-Reserved keyword in spark. "Non-Reserved keywords" have a > special meaning in particular contexts and can be used as identifiers in > other contexts. So by design, interval can be used as a column name. > {code:java} > scala> spark.sql("select interval from mytable") > org.apache.spark.sql.catalyst.parser.ParseException: > at least one time unit should be given for interval literal(line 1, pos 7)== > SQL == > select interval from mytable > ---^^^ at > org.apache.spark.sql.errors.QueryParsingErrors$.invalidIntervalLiteralError(QueryParsingErrors.scala:196) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$parseIntervalLiteral$1(AstBuilder.scala:2481) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.parseIntervalLiteral(AstBuilder.scala:2466) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2432) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2431) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17308) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitIntervalLiteral(SqlBaseBaseVisitor.java:1581) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalLiteralContext.accept(SqlBaseParser.java:16929) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitConstantDefault(SqlBaseBaseVisitor.java:1511) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ConstantDefaultContext.accept(SqlBaseParser.java:15905) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitValueExpressionDefault(SqlBaseBaseVisitor.java:1392) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ValueExpressionDefaultContext.accept(SqlBaseParser.java:15298) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:61) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.expression(AstBuilder.scala:1412) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitPredicated$1(AstBuilder.scala:1548) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitPredicated(AstBuilder.scala:1547) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitPredicated(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$PredicatedContext.accept(SqlBaseParser.java:14745) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitExpression(SqlBaseBaseVisitor.java:1343) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ExpressionContext.accept(SqlBaseParser.java:14606) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:61) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.expression(AstBuilder.scala:1412) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitNamedExpression$1(AstBuilder.scala:1434) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitNamedExpression(AstBuilder.scala:1433) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitNamedExpression(AstBuilder.scala:57) > at >
[jira] [Commented] (SPARK-42555) Add JDBC to DataFrameReader
[ https://issues.apache.org/jira/browse/SPARK-42555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695691#comment-17695691 ] Apache Spark commented on SPARK-42555: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40252 > Add JDBC to DataFrameReader > --- > > Key: SPARK-42555 > URL: https://issues.apache.org/jira/browse/SPARK-42555 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42555) Add JDBC to DataFrameReader
[ https://issues.apache.org/jira/browse/SPARK-42555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42555: Assignee: Apache Spark > Add JDBC to DataFrameReader > --- > > Key: SPARK-42555 > URL: https://issues.apache.org/jira/browse/SPARK-42555 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42555) Add JDBC to DataFrameReader
[ https://issues.apache.org/jira/browse/SPARK-42555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42555: Assignee: (was: Apache Spark) > Add JDBC to DataFrameReader > --- > > Key: SPARK-42555 > URL: https://issues.apache.org/jira/browse/SPARK-42555 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41725) Remove the workaround of sql(...).collect back in PySpark tests
[ https://issues.apache.org/jira/browse/SPARK-41725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695643#comment-17695643 ] Apache Spark commented on SPARK-41725: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/40251 > Remove the workaround of sql(...).collect back in PySpark tests > --- > > Key: SPARK-41725 > URL: https://issues.apache.org/jira/browse/SPARK-41725 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > See https://github.com/apache/spark/pull/39224/files#r1057436437 > We don't have to `collect` for every `sql` but Spark Connect requires it. We > should remove them out. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42649) Remove the standard Apache License header from the top of third-party source files
[ https://issues.apache.org/jira/browse/SPARK-42649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42649: Assignee: Apache Spark > Remove the standard Apache License header from the top of third-party source > files > -- > > Key: SPARK-42649 > URL: https://issues.apache.org/jira/browse/SPARK-42649 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.1, 1.2.2, 1.3.1, 1.4.1, 1.5.2, 1.6.3, 2.1.3, 2.2.3, > 2.3.4, 2.4.8, 3.0.3, 3.1.3, 3.2.3, 3.3.2, 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42649) Remove the standard Apache License header from the top of third-party source files
[ https://issues.apache.org/jira/browse/SPARK-42649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42649: Assignee: (was: Apache Spark) > Remove the standard Apache License header from the top of third-party source > files > -- > > Key: SPARK-42649 > URL: https://issues.apache.org/jira/browse/SPARK-42649 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.1, 1.2.2, 1.3.1, 1.4.1, 1.5.2, 1.6.3, 2.1.3, 2.2.3, > 2.3.4, 2.4.8, 3.0.3, 3.1.3, 3.2.3, 3.3.2, 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42648) Upgrade versions-maven-plugin to 2.15.0
[ https://issues.apache.org/jira/browse/SPARK-42648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42648: Assignee: Apache Spark > Upgrade versions-maven-plugin to 2.15.0 > --- > > Key: SPARK-42648 > URL: https://issues.apache.org/jira/browse/SPARK-42648 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > https://github.com/mojohaus/versions/releases/tag/2.15.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42649) Remove the standard Apache License header from the top of third-party source files
[ https://issues.apache.org/jira/browse/SPARK-42649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695547#comment-17695547 ] Apache Spark commented on SPARK-42649: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40249 > Remove the standard Apache License header from the top of third-party source > files > -- > > Key: SPARK-42649 > URL: https://issues.apache.org/jira/browse/SPARK-42649 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.1, 1.2.2, 1.3.1, 1.4.1, 1.5.2, 1.6.3, 2.1.3, 2.2.3, > 2.3.4, 2.4.8, 3.0.3, 3.1.3, 3.2.3, 3.3.2, 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42648) Upgrade versions-maven-plugin to 2.15.0
[ https://issues.apache.org/jira/browse/SPARK-42648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42648: Assignee: (was: Apache Spark) > Upgrade versions-maven-plugin to 2.15.0 > --- > > Key: SPARK-42648 > URL: https://issues.apache.org/jira/browse/SPARK-42648 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/mojohaus/versions/releases/tag/2.15.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42648) Upgrade versions-maven-plugin to 2.15.0
[ https://issues.apache.org/jira/browse/SPARK-42648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695545#comment-17695545 ] Apache Spark commented on SPARK-42648: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40248 > Upgrade versions-maven-plugin to 2.15.0 > --- > > Key: SPARK-42648 > URL: https://issues.apache.org/jira/browse/SPARK-42648 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/mojohaus/versions/releases/tag/2.15.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42642) Make Python the first code example tab in the Spark documentation
[ https://issues.apache.org/jira/browse/SPARK-42642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42642: Assignee: (was: Apache Spark) > Make Python the first code example tab in the Spark documentation > - > > Key: SPARK-42642 > URL: https://issues.apache.org/jira/browse/SPARK-42642 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Allan Folting >Priority: Major > Attachments: Screenshot 2023-03-01 at 8.10.08 PM.png, Screenshot > 2023-03-01 at 8.10.22 PM.png > > > Python is the most approachable and most popular language so it should be the > default language in code examples so this makes Python the first code example > tab consistently across the documentation, where applicable. > This is continuing the work started with: > https://issues.apache.org/jira/browse/SPARK-42493 > where these two pages were updated: > [https://spark.apache.org/docs/latest/sql-getting-started.html] > [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] > > Pages being updated now: > [https://spark.apache.org/docs/latest/ml-classification-regression.html] > [https://spark.apache.org/docs/latest/ml-clustering.html] > [https://spark.apache.org/docs/latest/ml-collaborative-filtering.html] > [https://spark.apache.org/docs/latest/ml-datasource.html] > [https://spark.apache.org/docs/latest/ml-features.html] > [https://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html] > [https://spark.apache.org/docs/latest/ml-migration-guide.html] > [https://spark.apache.org/docs/latest/ml-pipeline.html] > [https://spark.apache.org/docs/latest/ml-statistics.html] > [https://spark.apache.org/docs/latest/ml-tuning.html] > > [https://spark.apache.org/docs/latest/mllib-clustering.html] > [https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html] > [https://spark.apache.org/docs/latest/mllib-data-types.html] > [https://spark.apache.org/docs/latest/mllib-decision-tree.html] > [https://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html] > [https://spark.apache.org/docs/latest/mllib-ensembles.html] > [https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html] > [https://spark.apache.org/docs/latest/mllib-feature-extraction.html] > [https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html] > [https://spark.apache.org/docs/latest/mllib-isotonic-regression.html] > [https://spark.apache.org/docs/latest/mllib-linear-methods.html] > [https://spark.apache.org/docs/latest/mllib-naive-bayes.html] > [https://spark.apache.org/docs/latest/mllib-statistics.html] > > [https://spark.apache.org/docs/latest/quick-start.html] > > [https://spark.apache.org/docs/latest/rdd-programming-guide.html] > > [https://spark.apache.org/docs/latest/sql-data-sources-avro.html] > [https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html] > [https://spark.apache.org/docs/latest/sql-data-sources-csv.html] > [https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html] > [https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html] > [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html] > [https://spark.apache.org/docs/latest/sql-data-sources-json.html] > [https://spark.apache.org/docs/latest/sql-data-sources-parquet.html] > sql-data-sources-protobuf.html > [https://spark.apache.org/docs/latest/sql-data-sources-text.html] > [https://spark.apache.org/docs/latest/sql-migration-guide.html] > [https://spark.apache.org/docs/latest/sql-performance-tuning.html] > [https://spark.apache.org/docs/latest/sql-ref-datatypes.html] > > [https://spark.apache.org/docs/latest/streaming-kinesis-integration.html] > [https://spark.apache.org/docs/latest/streaming-programming-guide.html] > > [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html] > [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html] > > > > > > > > > > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42642) Make Python the first code example tab in the Spark documentation
[ https://issues.apache.org/jira/browse/SPARK-42642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42642: Assignee: Apache Spark > Make Python the first code example tab in the Spark documentation > - > > Key: SPARK-42642 > URL: https://issues.apache.org/jira/browse/SPARK-42642 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Allan Folting >Assignee: Apache Spark >Priority: Major > Attachments: Screenshot 2023-03-01 at 8.10.08 PM.png, Screenshot > 2023-03-01 at 8.10.22 PM.png > > > Python is the most approachable and most popular language so it should be the > default language in code examples so this makes Python the first code example > tab consistently across the documentation, where applicable. > This is continuing the work started with: > https://issues.apache.org/jira/browse/SPARK-42493 > where these two pages were updated: > [https://spark.apache.org/docs/latest/sql-getting-started.html] > [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] > > Pages being updated now: > [https://spark.apache.org/docs/latest/ml-classification-regression.html] > [https://spark.apache.org/docs/latest/ml-clustering.html] > [https://spark.apache.org/docs/latest/ml-collaborative-filtering.html] > [https://spark.apache.org/docs/latest/ml-datasource.html] > [https://spark.apache.org/docs/latest/ml-features.html] > [https://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html] > [https://spark.apache.org/docs/latest/ml-migration-guide.html] > [https://spark.apache.org/docs/latest/ml-pipeline.html] > [https://spark.apache.org/docs/latest/ml-statistics.html] > [https://spark.apache.org/docs/latest/ml-tuning.html] > > [https://spark.apache.org/docs/latest/mllib-clustering.html] > [https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html] > [https://spark.apache.org/docs/latest/mllib-data-types.html] > [https://spark.apache.org/docs/latest/mllib-decision-tree.html] > [https://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html] > [https://spark.apache.org/docs/latest/mllib-ensembles.html] > [https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html] > [https://spark.apache.org/docs/latest/mllib-feature-extraction.html] > [https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html] > [https://spark.apache.org/docs/latest/mllib-isotonic-regression.html] > [https://spark.apache.org/docs/latest/mllib-linear-methods.html] > [https://spark.apache.org/docs/latest/mllib-naive-bayes.html] > [https://spark.apache.org/docs/latest/mllib-statistics.html] > > [https://spark.apache.org/docs/latest/quick-start.html] > > [https://spark.apache.org/docs/latest/rdd-programming-guide.html] > > [https://spark.apache.org/docs/latest/sql-data-sources-avro.html] > [https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html] > [https://spark.apache.org/docs/latest/sql-data-sources-csv.html] > [https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html] > [https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html] > [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html] > [https://spark.apache.org/docs/latest/sql-data-sources-json.html] > [https://spark.apache.org/docs/latest/sql-data-sources-parquet.html] > sql-data-sources-protobuf.html > [https://spark.apache.org/docs/latest/sql-data-sources-text.html] > [https://spark.apache.org/docs/latest/sql-migration-guide.html] > [https://spark.apache.org/docs/latest/sql-performance-tuning.html] > [https://spark.apache.org/docs/latest/sql-ref-datatypes.html] > > [https://spark.apache.org/docs/latest/streaming-kinesis-integration.html] > [https://spark.apache.org/docs/latest/streaming-programming-guide.html] > > [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html] > [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html] > > > > > > > > > > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42642) Make Python the first code example tab in the Spark documentation
[ https://issues.apache.org/jira/browse/SPARK-42642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695544#comment-17695544 ] Apache Spark commented on SPARK-42642: -- User 'allanf-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40250 > Make Python the first code example tab in the Spark documentation > - > > Key: SPARK-42642 > URL: https://issues.apache.org/jira/browse/SPARK-42642 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Allan Folting >Priority: Major > Attachments: Screenshot 2023-03-01 at 8.10.08 PM.png, Screenshot > 2023-03-01 at 8.10.22 PM.png > > > Python is the most approachable and most popular language so it should be the > default language in code examples so this makes Python the first code example > tab consistently across the documentation, where applicable. > This is continuing the work started with: > https://issues.apache.org/jira/browse/SPARK-42493 > where these two pages were updated: > [https://spark.apache.org/docs/latest/sql-getting-started.html] > [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] > > Pages being updated now: > [https://spark.apache.org/docs/latest/ml-classification-regression.html] > [https://spark.apache.org/docs/latest/ml-clustering.html] > [https://spark.apache.org/docs/latest/ml-collaborative-filtering.html] > [https://spark.apache.org/docs/latest/ml-datasource.html] > [https://spark.apache.org/docs/latest/ml-features.html] > [https://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html] > [https://spark.apache.org/docs/latest/ml-migration-guide.html] > [https://spark.apache.org/docs/latest/ml-pipeline.html] > [https://spark.apache.org/docs/latest/ml-statistics.html] > [https://spark.apache.org/docs/latest/ml-tuning.html] > > [https://spark.apache.org/docs/latest/mllib-clustering.html] > [https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html] > [https://spark.apache.org/docs/latest/mllib-data-types.html] > [https://spark.apache.org/docs/latest/mllib-decision-tree.html] > [https://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html] > [https://spark.apache.org/docs/latest/mllib-ensembles.html] > [https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html] > [https://spark.apache.org/docs/latest/mllib-feature-extraction.html] > [https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html] > [https://spark.apache.org/docs/latest/mllib-isotonic-regression.html] > [https://spark.apache.org/docs/latest/mllib-linear-methods.html] > [https://spark.apache.org/docs/latest/mllib-naive-bayes.html] > [https://spark.apache.org/docs/latest/mllib-statistics.html] > > [https://spark.apache.org/docs/latest/quick-start.html] > > [https://spark.apache.org/docs/latest/rdd-programming-guide.html] > > [https://spark.apache.org/docs/latest/sql-data-sources-avro.html] > [https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html] > [https://spark.apache.org/docs/latest/sql-data-sources-csv.html] > [https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html] > [https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html] > [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html] > [https://spark.apache.org/docs/latest/sql-data-sources-json.html] > [https://spark.apache.org/docs/latest/sql-data-sources-parquet.html] > sql-data-sources-protobuf.html > [https://spark.apache.org/docs/latest/sql-data-sources-text.html] > [https://spark.apache.org/docs/latest/sql-migration-guide.html] > [https://spark.apache.org/docs/latest/sql-performance-tuning.html] > [https://spark.apache.org/docs/latest/sql-ref-datatypes.html] > > [https://spark.apache.org/docs/latest/streaming-kinesis-integration.html] > [https://spark.apache.org/docs/latest/streaming-programming-guide.html] > > [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html] > [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html] > > > > > > > > > > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42647) Remove aliases from deprecated numpy data types
[ https://issues.apache.org/jira/browse/SPARK-42647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42647: Assignee: (was: Apache Spark) > Remove aliases from deprecated numpy data types > --- > > Key: SPARK-42647 > URL: https://issues.apache.org/jira/browse/SPARK-42647 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0, 3.4.1 >Reporter: Aimilios Tsouvelekakis >Priority: Major > > Numpy has started changing the alias to some of its data-types. This means > that users with the latest version of numpy they will face either warnings or > errors according to the type that they are using. This affects all the users > using numoy > 1.20.0. One of the types was fixed back in September with this > [pull|https://github.com/apache/spark/pull/37817] request. > The problem can be split into 2 types: > [numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type > aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, > np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually > be removed. At this point in numpy 1.25.0 they give a awarning > [numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases > of builtin types like np.int is deprecated and removed since numpy version > 1.24.0 > The changes are needed so pyspark can be compatible with the latest numpy and > avoid > * attribute errors on data types being deprecated from version 1.20.0: > [https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations] > * warnings on deprecated data types from version 1.24.0: > [https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations] > > From my main research I see the following: > The only changes that are functional are related with the conversion.py file. > The rest of the changes are inside tests in the user_guide or in some > docstrings describing specific functions. Since I am not an expert in these > tests I wait for the reviewer and some people with more experience in the > pyspark code. > These types are aliases for classic python types so yes they should work with > all the numpy versions > [1|https://numpy.org/devdocs/release/1.20.0-notes.html], > [2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python]. > The error or warning comes from the call to the numpy. > > For the versions I chose to include from 3.3 and onwards but I see that 3.2 > also is still in the 18 month maintenace cadence as it was released in > October 2021. > > The pull request: [https://github.com/apache/spark/pull/40220] > Best Regards, > Aimilios -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42647) Remove aliases from deprecated numpy data types
[ https://issues.apache.org/jira/browse/SPARK-42647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695520#comment-17695520 ] Apache Spark commented on SPARK-42647: -- User 'aimtsou' has created a pull request for this issue: https://github.com/apache/spark/pull/40220 > Remove aliases from deprecated numpy data types > --- > > Key: SPARK-42647 > URL: https://issues.apache.org/jira/browse/SPARK-42647 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0, 3.4.1 >Reporter: Aimilios Tsouvelekakis >Priority: Major > > Numpy has started changing the alias to some of its data-types. This means > that users with the latest version of numpy they will face either warnings or > errors according to the type that they are using. This affects all the users > using numoy > 1.20.0. One of the types was fixed back in September with this > [pull|https://github.com/apache/spark/pull/37817] request. > The problem can be split into 2 types: > [numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type > aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, > np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually > be removed. At this point in numpy 1.25.0 they give a awarning > [numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases > of builtin types like np.int is deprecated and removed since numpy version > 1.24.0 > The changes are needed so pyspark can be compatible with the latest numpy and > avoid > * attribute errors on data types being deprecated from version 1.20.0: > [https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations] > * warnings on deprecated data types from version 1.24.0: > [https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations] > > From my main research I see the following: > The only changes that are functional are related with the conversion.py file. > The rest of the changes are inside tests in the user_guide or in some > docstrings describing specific functions. Since I am not an expert in these > tests I wait for the reviewer and some people with more experience in the > pyspark code. > These types are aliases for classic python types so yes they should work with > all the numpy versions > [1|https://numpy.org/devdocs/release/1.20.0-notes.html], > [2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python]. > The error or warning comes from the call to the numpy. > > For the versions I chose to include from 3.3 and onwards but I see that 3.2 > also is still in the 18 month maintenace cadence as it was released in > October 2021. > > The pull request: [https://github.com/apache/spark/pull/40220] > Best Regards, > Aimilios -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42647) Remove aliases from deprecated numpy data types
[ https://issues.apache.org/jira/browse/SPARK-42647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42647: Assignee: Apache Spark > Remove aliases from deprecated numpy data types > --- > > Key: SPARK-42647 > URL: https://issues.apache.org/jira/browse/SPARK-42647 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0, 3.4.1 >Reporter: Aimilios Tsouvelekakis >Assignee: Apache Spark >Priority: Major > > Numpy has started changing the alias to some of its data-types. This means > that users with the latest version of numpy they will face either warnings or > errors according to the type that they are using. This affects all the users > using numoy > 1.20.0. One of the types was fixed back in September with this > [pull|https://github.com/apache/spark/pull/37817] request. > The problem can be split into 2 types: > [numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type > aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, > np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually > be removed. At this point in numpy 1.25.0 they give a awarning > [numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases > of builtin types like np.int is deprecated and removed since numpy version > 1.24.0 > The changes are needed so pyspark can be compatible with the latest numpy and > avoid > * attribute errors on data types being deprecated from version 1.20.0: > [https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations] > * warnings on deprecated data types from version 1.24.0: > [https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations] > > From my main research I see the following: > The only changes that are functional are related with the conversion.py file. > The rest of the changes are inside tests in the user_guide or in some > docstrings describing specific functions. Since I am not an expert in these > tests I wait for the reviewer and some people with more experience in the > pyspark code. > These types are aliases for classic python types so yes they should work with > all the numpy versions > [1|https://numpy.org/devdocs/release/1.20.0-notes.html], > [2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python]. > The error or warning comes from the call to the numpy. > > For the versions I chose to include from 3.3 and onwards but I see that 3.2 > also is still in the 18 month maintenace cadence as it was released in > October 2021. > > The pull request: [https://github.com/apache/spark/pull/40220] > Best Regards, > Aimilios -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42646) Upgrad cyclonedx from 2.7.3 to 2.7.5
[ https://issues.apache.org/jira/browse/SPARK-42646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695465#comment-17695465 ] Apache Spark commented on SPARK-42646: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40247 > Upgrad cyclonedx from 2.7.3 to 2.7.5 > > > Key: SPARK-42646 > URL: https://issues.apache.org/jira/browse/SPARK-42646 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > > !https://user-images.githubusercontent.com/15246973/222338040-d7c8d595-be0b-40bb-af49-6b260dc0c425.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42646) Upgrad cyclonedx from 2.7.3 to 2.7.5
[ https://issues.apache.org/jira/browse/SPARK-42646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42646: Assignee: Apache Spark > Upgrad cyclonedx from 2.7.3 to 2.7.5 > > > Key: SPARK-42646 > URL: https://issues.apache.org/jira/browse/SPARK-42646 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > > !https://user-images.githubusercontent.com/15246973/222338040-d7c8d595-be0b-40bb-af49-6b260dc0c425.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42646) Upgrad cyclonedx from 2.7.3 to 2.7.5
[ https://issues.apache.org/jira/browse/SPARK-42646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42646: Assignee: (was: Apache Spark) > Upgrad cyclonedx from 2.7.3 to 2.7.5 > > > Key: SPARK-42646 > URL: https://issues.apache.org/jira/browse/SPARK-42646 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > > !https://user-images.githubusercontent.com/15246973/222338040-d7c8d595-be0b-40bb-af49-6b260dc0c425.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42644) Add `hive` dependency to `connect` module
[ https://issues.apache.org/jira/browse/SPARK-42644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42644: Assignee: (was: Apache Spark) > Add `hive` dependency to `connect` module > - > > Key: SPARK-42644 > URL: https://issues.apache.org/jira/browse/SPARK-42644 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42644) Add `hive` dependency to `connect` module
[ https://issues.apache.org/jira/browse/SPARK-42644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42644: Assignee: Apache Spark > Add `hive` dependency to `connect` module > - > > Key: SPARK-42644 > URL: https://issues.apache.org/jira/browse/SPARK-42644 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42644) Add `hive` dependency to `connect` module
[ https://issues.apache.org/jira/browse/SPARK-42644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695464#comment-17695464 ] Apache Spark commented on SPARK-42644: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40246 > Add `hive` dependency to `connect` module > - > > Key: SPARK-42644 > URL: https://issues.apache.org/jira/browse/SPARK-42644 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42643) Implement `spark.udf.registerJavaFunction`
[ https://issues.apache.org/jira/browse/SPARK-42643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695429#comment-17695429 ] Apache Spark commented on SPARK-42643: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40244 > Implement `spark.udf.registerJavaFunction` > -- > > Key: SPARK-42643 > URL: https://issues.apache.org/jira/browse/SPARK-42643 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `spark.udf.registerJavaFunction`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42643) Implement `spark.udf.registerJavaFunction`
[ https://issues.apache.org/jira/browse/SPARK-42643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42643: Assignee: Apache Spark > Implement `spark.udf.registerJavaFunction` > -- > > Key: SPARK-42643 > URL: https://issues.apache.org/jira/browse/SPARK-42643 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Implement `spark.udf.registerJavaFunction`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42643) Implement `spark.udf.registerJavaFunction`
[ https://issues.apache.org/jira/browse/SPARK-42643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42643: Assignee: (was: Apache Spark) > Implement `spark.udf.registerJavaFunction` > -- > > Key: SPARK-42643 > URL: https://issues.apache.org/jira/browse/SPARK-42643 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `spark.udf.registerJavaFunction`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names
[ https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695428#comment-17695428 ] Apache Spark commented on SPARK-41823: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/40245 > DataFrame.join creating ambiguous column names > -- > > Key: SPARK-41823 > URL: https://issues.apache.org/jira/browse/SPARK-41823 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 254, in pyspark.sql.connect.dataframe.DataFrame.drop > Failed example: > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, > `name`]. > Plan: {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42641) Upgrade buf to v1.15.0
[ https://issues.apache.org/jira/browse/SPARK-42641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42641: Assignee: (was: Apache Spark) > Upgrade buf to v1.15.0 > -- > > Key: SPARK-42641 > URL: https://issues.apache.org/jira/browse/SPARK-42641 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42641) Upgrade buf to v1.15.0
[ https://issues.apache.org/jira/browse/SPARK-42641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42641: Assignee: Apache Spark > Upgrade buf to v1.15.0 > -- > > Key: SPARK-42641 > URL: https://issues.apache.org/jira/browse/SPARK-42641 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42641) Upgrade buf to v1.15.0
[ https://issues.apache.org/jira/browse/SPARK-42641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695404#comment-17695404 ] Apache Spark commented on SPARK-42641: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40243 > Upgrade buf to v1.15.0 > -- > > Key: SPARK-42641 > URL: https://issues.apache.org/jira/browse/SPARK-42641 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42640) Remove stale entries from the excluding rules for CompabilitySuite
[ https://issues.apache.org/jira/browse/SPARK-42640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42640: Assignee: Apache Spark (was: Rui Wang) > Remove stale entries from the excluding rules for CompabilitySuite > -- > > Key: SPARK-42640 > URL: https://issues.apache.org/jira/browse/SPARK-42640 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.1 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42640) Remove stale entries from the excluding rules for CompabilitySuite
[ https://issues.apache.org/jira/browse/SPARK-42640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42640: Assignee: Rui Wang (was: Apache Spark) > Remove stale entries from the excluding rules for CompabilitySuite > -- > > Key: SPARK-42640 > URL: https://issues.apache.org/jira/browse/SPARK-42640 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.1 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42640) Remove stale entries from the excluding rules for CompabilitySuite
[ https://issues.apache.org/jira/browse/SPARK-42640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695387#comment-17695387 ] Apache Spark commented on SPARK-42640: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40241 > Remove stale entries from the excluding rules for CompabilitySuite > -- > > Key: SPARK-42640 > URL: https://issues.apache.org/jira/browse/SPARK-42640 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.1 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42639) Add createDataFrame/createDataset to SparkSession
[ https://issues.apache.org/jira/browse/SPARK-42639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695386#comment-17695386 ] Apache Spark commented on SPARK-42639: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/40242 > Add createDataFrame/createDataset to SparkSession > - > > Key: SPARK-42639 > URL: https://issues.apache.org/jira/browse/SPARK-42639 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > > Add createDataFrame/createDataset to SparkSession -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42639) Add createDataFrame/createDataset to SparkSession
[ https://issues.apache.org/jira/browse/SPARK-42639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42639: Assignee: Apache Spark (was: Herman van Hövell) > Add createDataFrame/createDataset to SparkSession > - > > Key: SPARK-42639 > URL: https://issues.apache.org/jira/browse/SPARK-42639 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > > Add createDataFrame/createDataset to SparkSession -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42639) Add createDataFrame/createDataset to SparkSession
[ https://issues.apache.org/jira/browse/SPARK-42639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42639: Assignee: Herman van Hövell (was: Apache Spark) > Add createDataFrame/createDataset to SparkSession > - > > Key: SPARK-42639 > URL: https://issues.apache.org/jira/browse/SPARK-42639 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > > Add createDataFrame/createDataset to SparkSession -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42458) createDataFrame should support DDL string as schema
[ https://issues.apache.org/jira/browse/SPARK-42458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42458: Assignee: Apache Spark > createDataFrame should support DDL string as schema > --- > > Key: SPARK-42458 > URL: https://issues.apache.org/jira/browse/SPARK-42458 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > {code:python} > File "/.../python/pyspark/sql/connect/readwriter.py", line 393, in > pyspark.sql.connect.readwriter.DataFrameWriter.option > Failed example: > with tempfile.TemporaryDirectory() as d: > # Write a DataFrame into a CSV file with 'nullValue' option set to > 'Hyukjin Kwon'. > df = spark.createDataFrame([(100, None)], "age INT, name STRING") > df.write.option("nullValue", "Hyukjin > Kwon").mode("overwrite").format("csv").save(d) > # Read the CSV file as a DataFrame. > spark.read.schema(df.schema).format('csv').load(d).show() > Exception raised: > Traceback (most recent call last): > File "/.../lib/python3.9/doctest.py", line 1334, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.option[2]>", line 3, in > > df = spark.createDataFrame([(100, None)], "age INT, name STRING") > File "/.../python/pyspark/sql/connect/session.py", line 312, in > createDataFrame > raise ValueError( > ValueError: Some of types cannot be determined after inferring, a > StructType Schema is required in this case > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42458) createDataFrame should support DDL string as schema
[ https://issues.apache.org/jira/browse/SPARK-42458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695351#comment-17695351 ] Apache Spark commented on SPARK-42458: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40240 > createDataFrame should support DDL string as schema > --- > > Key: SPARK-42458 > URL: https://issues.apache.org/jira/browse/SPARK-42458 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > > {code:python} > File "/.../python/pyspark/sql/connect/readwriter.py", line 393, in > pyspark.sql.connect.readwriter.DataFrameWriter.option > Failed example: > with tempfile.TemporaryDirectory() as d: > # Write a DataFrame into a CSV file with 'nullValue' option set to > 'Hyukjin Kwon'. > df = spark.createDataFrame([(100, None)], "age INT, name STRING") > df.write.option("nullValue", "Hyukjin > Kwon").mode("overwrite").format("csv").save(d) > # Read the CSV file as a DataFrame. > spark.read.schema(df.schema).format('csv').load(d).show() > Exception raised: > Traceback (most recent call last): > File "/.../lib/python3.9/doctest.py", line 1334, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.option[2]>", line 3, in > > df = spark.createDataFrame([(100, None)], "age INT, name STRING") > File "/.../python/pyspark/sql/connect/session.py", line 312, in > createDataFrame > raise ValueError( > ValueError: Some of types cannot be determined after inferring, a > StructType Schema is required in this case > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42458) createDataFrame should support DDL string as schema
[ https://issues.apache.org/jira/browse/SPARK-42458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42458: Assignee: (was: Apache Spark) > createDataFrame should support DDL string as schema > --- > > Key: SPARK-42458 > URL: https://issues.apache.org/jira/browse/SPARK-42458 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > > {code:python} > File "/.../python/pyspark/sql/connect/readwriter.py", line 393, in > pyspark.sql.connect.readwriter.DataFrameWriter.option > Failed example: > with tempfile.TemporaryDirectory() as d: > # Write a DataFrame into a CSV file with 'nullValue' option set to > 'Hyukjin Kwon'. > df = spark.createDataFrame([(100, None)], "age INT, name STRING") > df.write.option("nullValue", "Hyukjin > Kwon").mode("overwrite").format("csv").save(d) > # Read the CSV file as a DataFrame. > spark.read.schema(df.schema).format('csv').load(d).show() > Exception raised: > Traceback (most recent call last): > File "/.../lib/python3.9/doctest.py", line 1334, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.option[2]>", line 3, in > > df = spark.createDataFrame([(100, None)], "age INT, name STRING") > File "/.../python/pyspark/sql/connect/session.py", line 312, in > createDataFrame > raise ValueError( > ValueError: Some of types cannot be determined after inferring, a > StructType Schema is required in this case > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42633) Use the actual schema in a LocalRelation
[ https://issues.apache.org/jira/browse/SPARK-42633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695280#comment-17695280 ] Apache Spark commented on SPARK-42633: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/40238 > Use the actual schema in a LocalRelation > > > Key: SPARK-42633 > URL: https://issues.apache.org/jira/browse/SPARK-42633 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > > Make the LocalRelation proto take an actual schema message instead of a > string. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42637) Add SparkSession.stop
[ https://issues.apache.org/jira/browse/SPARK-42637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695279#comment-17695279 ] Apache Spark commented on SPARK-42637: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/40239 > Add SparkSession.stop > - > > Key: SPARK-42637 > URL: https://issues.apache.org/jira/browse/SPARK-42637 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > > Add SparkSession.stop() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42635) Several counter-intuitive behaviours in the TimestampAdd expression
[ https://issues.apache.org/jira/browse/SPARK-42635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42635: Assignee: (was: Apache Spark) > Several counter-intuitive behaviours in the TimestampAdd expression > --- > > Key: SPARK-42635 > URL: https://issues.apache.org/jira/browse/SPARK-42635 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: Chenhao Li >Priority: Major > > # When the time is close to daylight saving time transition, the result may > be discontinuous and not monotonic. > We currently have: > {code:scala} > scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > scala> spark.sql("select timestampadd(second, 24 * 3600 - 1, > timestamp'2011-03-12 03:00:00')").show > ++ > |timestampadd(second, ((24 * 3600) - 1), TIMESTAMP '2011-03-12 03:00:00')| > ++ > | 2011-03-13 03:59:59| > ++ > scala> spark.sql("select timestampadd(second, 24 * 3600, timestamp'2011-03-12 > 03:00:00')").show > +--+ > |timestampadd(second, (24 * 3600), TIMESTAMP '2011-03-12 03:00:00')| > +--+ > | 2011-03-13 03:00:00| > +--+ {code} > > In the second query, adding one more second will set the time back one hour > instead. Plus, there are only {{23 * 3600}} seconds from {{2011-03-12 > 03:00:00}} to {{2011-03-13 03:00:00}}, instead of {{24 * 3600}} seconds, due > to the daylight saving time transition. > The root cause of the problem is the Spark code at > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L790] > wrongly assumes every day has {{MICROS_PER_DAY}} seconds, and does the day > and time-in-day split before looking at the timezone. > 2. Adding month, quarter, and year silently ignores Int overflow during unit > conversion. > The root cause is > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1246]. > {{quantity}} is multiplied by {{3}} or {{MONTHS_PER_YEAR}} without checking > overflow. Note that we do have overflow checking in adding the amount to the > timestamp, so the behavior is inconsistent. > This can cause counter-intuitive results like this: > {code:scala} > scala> spark.sql("select timestampadd(quarter, 1431655764, > timestamp'1970-01-01')").show > +--+ > |timestampadd(quarter, 1431655764, TIMESTAMP '1970-01-01 00:00:00')| > +--+ > | 1969-09-01 00:00:00| > +--+{code} > 3. Adding sub-month units (week, day, hour, minute, second, millisecond, > microsecond)silently ignores Long overflow during unit conversion. > This is similar to the previous problem: > {code:scala} > scala> spark.sql("select timestampadd(day, 106751992, > timestamp'1970-01-01')").show(false) > +-+ > |timestampadd(day, 106751992, TIMESTAMP '1970-01-01 00:00:00')| > +-+ > |-290308-12-22 15:58:10.448384| > +-+{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42635) Several counter-intuitive behaviours in the TimestampAdd expression
[ https://issues.apache.org/jira/browse/SPARK-42635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695278#comment-17695278 ] Apache Spark commented on SPARK-42635: -- User 'chenhao-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40237 > Several counter-intuitive behaviours in the TimestampAdd expression > --- > > Key: SPARK-42635 > URL: https://issues.apache.org/jira/browse/SPARK-42635 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: Chenhao Li >Priority: Major > > # When the time is close to daylight saving time transition, the result may > be discontinuous and not monotonic. > We currently have: > {code:scala} > scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > scala> spark.sql("select timestampadd(second, 24 * 3600 - 1, > timestamp'2011-03-12 03:00:00')").show > ++ > |timestampadd(second, ((24 * 3600) - 1), TIMESTAMP '2011-03-12 03:00:00')| > ++ > | 2011-03-13 03:59:59| > ++ > scala> spark.sql("select timestampadd(second, 24 * 3600, timestamp'2011-03-12 > 03:00:00')").show > +--+ > |timestampadd(second, (24 * 3600), TIMESTAMP '2011-03-12 03:00:00')| > +--+ > | 2011-03-13 03:00:00| > +--+ {code} > > In the second query, adding one more second will set the time back one hour > instead. Plus, there are only {{23 * 3600}} seconds from {{2011-03-12 > 03:00:00}} to {{2011-03-13 03:00:00}}, instead of {{24 * 3600}} seconds, due > to the daylight saving time transition. > The root cause of the problem is the Spark code at > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L790] > wrongly assumes every day has {{MICROS_PER_DAY}} seconds, and does the day > and time-in-day split before looking at the timezone. > 2. Adding month, quarter, and year silently ignores Int overflow during unit > conversion. > The root cause is > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1246]. > {{quantity}} is multiplied by {{3}} or {{MONTHS_PER_YEAR}} without checking > overflow. Note that we do have overflow checking in adding the amount to the > timestamp, so the behavior is inconsistent. > This can cause counter-intuitive results like this: > {code:scala} > scala> spark.sql("select timestampadd(quarter, 1431655764, > timestamp'1970-01-01')").show > +--+ > |timestampadd(quarter, 1431655764, TIMESTAMP '1970-01-01 00:00:00')| > +--+ > | 1969-09-01 00:00:00| > +--+{code} > 3. Adding sub-month units (week, day, hour, minute, second, millisecond, > microsecond)silently ignores Long overflow during unit conversion. > This is similar to the previous problem: > {code:scala} > scala> spark.sql("select timestampadd(day, 106751992, > timestamp'1970-01-01')").show(false) > +-+ > |timestampadd(day, 106751992, TIMESTAMP '1970-01-01 00:00:00')| > +-+ > |-290308-12-22 15:58:10.448384| > +-+{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42635) Several counter-intuitive behaviours in the TimestampAdd expression
[ https://issues.apache.org/jira/browse/SPARK-42635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42635: Assignee: Apache Spark > Several counter-intuitive behaviours in the TimestampAdd expression > --- > > Key: SPARK-42635 > URL: https://issues.apache.org/jira/browse/SPARK-42635 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: Chenhao Li >Assignee: Apache Spark >Priority: Major > > # When the time is close to daylight saving time transition, the result may > be discontinuous and not monotonic. > We currently have: > {code:scala} > scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > scala> spark.sql("select timestampadd(second, 24 * 3600 - 1, > timestamp'2011-03-12 03:00:00')").show > ++ > |timestampadd(second, ((24 * 3600) - 1), TIMESTAMP '2011-03-12 03:00:00')| > ++ > | 2011-03-13 03:59:59| > ++ > scala> spark.sql("select timestampadd(second, 24 * 3600, timestamp'2011-03-12 > 03:00:00')").show > +--+ > |timestampadd(second, (24 * 3600), TIMESTAMP '2011-03-12 03:00:00')| > +--+ > | 2011-03-13 03:00:00| > +--+ {code} > > In the second query, adding one more second will set the time back one hour > instead. Plus, there are only {{23 * 3600}} seconds from {{2011-03-12 > 03:00:00}} to {{2011-03-13 03:00:00}}, instead of {{24 * 3600}} seconds, due > to the daylight saving time transition. > The root cause of the problem is the Spark code at > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L790] > wrongly assumes every day has {{MICROS_PER_DAY}} seconds, and does the day > and time-in-day split before looking at the timezone. > 2. Adding month, quarter, and year silently ignores Int overflow during unit > conversion. > The root cause is > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1246]. > {{quantity}} is multiplied by {{3}} or {{MONTHS_PER_YEAR}} without checking > overflow. Note that we do have overflow checking in adding the amount to the > timestamp, so the behavior is inconsistent. > This can cause counter-intuitive results like this: > {code:scala} > scala> spark.sql("select timestampadd(quarter, 1431655764, > timestamp'1970-01-01')").show > +--+ > |timestampadd(quarter, 1431655764, TIMESTAMP '1970-01-01 00:00:00')| > +--+ > | 1969-09-01 00:00:00| > +--+{code} > 3. Adding sub-month units (week, day, hour, minute, second, millisecond, > microsecond)silently ignores Long overflow during unit conversion. > This is similar to the previous problem: > {code:scala} > scala> spark.sql("select timestampadd(day, 106751992, > timestamp'1970-01-01')").show(false) > +-+ > |timestampadd(day, 106751992, TIMESTAMP '1970-01-01 00:00:00')| > +-+ > |-290308-12-22 15:58:10.448384| > +-+{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38735) Test the error class: INTERNAL_ERROR
[ https://issues.apache.org/jira/browse/SPARK-38735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695259#comment-17695259 ] Apache Spark commented on SPARK-38735: -- User 'the8thC' has created a pull request for this issue: https://github.com/apache/spark/pull/40236 > Test the error class: INTERNAL_ERROR > > > Key: SPARK-38735 > URL: https://issues.apache.org/jira/browse/SPARK-38735 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add tests for the error class *INTERNAL_ERROR* to QueryExecutionErrorsSuite. > The test should cover the exception throw in QueryExecutionErrors: > {code:scala} > def logicalHintOperatorNotRemovedDuringAnalysisError(): Throwable = { > new SparkIllegalStateException(errorClass = "INTERNAL_ERROR", > messageParameters = Array( > "Internal error: logical hint operator should have been removed > during analysis")) > } > def cannotEvaluateExpressionError(expression: Expression): Throwable = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot evaluate expression: $expression")) > } > def cannotGenerateCodeForExpressionError(expression: Expression): Throwable > = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot generate code for expression: > $expression")) > } > def cannotTerminateGeneratorError(generator: UnresolvedGenerator): > Throwable = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot terminate expression: $generator")) > } > def methodNotDeclaredError(name: String): Throwable = { > new SparkNoSuchMethodException(errorClass = "INTERNAL_ERROR", > messageParameters = Array( > s"""A method named "$name" is not declared in any enclosing class nor > any supertype""")) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38735) Test the error class: INTERNAL_ERROR
[ https://issues.apache.org/jira/browse/SPARK-38735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38735: Assignee: (was: Apache Spark) > Test the error class: INTERNAL_ERROR > > > Key: SPARK-38735 > URL: https://issues.apache.org/jira/browse/SPARK-38735 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add tests for the error class *INTERNAL_ERROR* to QueryExecutionErrorsSuite. > The test should cover the exception throw in QueryExecutionErrors: > {code:scala} > def logicalHintOperatorNotRemovedDuringAnalysisError(): Throwable = { > new SparkIllegalStateException(errorClass = "INTERNAL_ERROR", > messageParameters = Array( > "Internal error: logical hint operator should have been removed > during analysis")) > } > def cannotEvaluateExpressionError(expression: Expression): Throwable = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot evaluate expression: $expression")) > } > def cannotGenerateCodeForExpressionError(expression: Expression): Throwable > = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot generate code for expression: > $expression")) > } > def cannotTerminateGeneratorError(generator: UnresolvedGenerator): > Throwable = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot terminate expression: $generator")) > } > def methodNotDeclaredError(name: String): Throwable = { > new SparkNoSuchMethodException(errorClass = "INTERNAL_ERROR", > messageParameters = Array( > s"""A method named "$name" is not declared in any enclosing class nor > any supertype""")) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38735) Test the error class: INTERNAL_ERROR
[ https://issues.apache.org/jira/browse/SPARK-38735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695261#comment-17695261 ] Apache Spark commented on SPARK-38735: -- User 'the8thC' has created a pull request for this issue: https://github.com/apache/spark/pull/40236 > Test the error class: INTERNAL_ERROR > > > Key: SPARK-38735 > URL: https://issues.apache.org/jira/browse/SPARK-38735 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add tests for the error class *INTERNAL_ERROR* to QueryExecutionErrorsSuite. > The test should cover the exception throw in QueryExecutionErrors: > {code:scala} > def logicalHintOperatorNotRemovedDuringAnalysisError(): Throwable = { > new SparkIllegalStateException(errorClass = "INTERNAL_ERROR", > messageParameters = Array( > "Internal error: logical hint operator should have been removed > during analysis")) > } > def cannotEvaluateExpressionError(expression: Expression): Throwable = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot evaluate expression: $expression")) > } > def cannotGenerateCodeForExpressionError(expression: Expression): Throwable > = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot generate code for expression: > $expression")) > } > def cannotTerminateGeneratorError(generator: UnresolvedGenerator): > Throwable = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot terminate expression: $generator")) > } > def methodNotDeclaredError(name: String): Throwable = { > new SparkNoSuchMethodException(errorClass = "INTERNAL_ERROR", > messageParameters = Array( > s"""A method named "$name" is not declared in any enclosing class nor > any supertype""")) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38735) Test the error class: INTERNAL_ERROR
[ https://issues.apache.org/jira/browse/SPARK-38735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38735: Assignee: Apache Spark > Test the error class: INTERNAL_ERROR > > > Key: SPARK-38735 > URL: https://issues.apache.org/jira/browse/SPARK-38735 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: starter > > Add tests for the error class *INTERNAL_ERROR* to QueryExecutionErrorsSuite. > The test should cover the exception throw in QueryExecutionErrors: > {code:scala} > def logicalHintOperatorNotRemovedDuringAnalysisError(): Throwable = { > new SparkIllegalStateException(errorClass = "INTERNAL_ERROR", > messageParameters = Array( > "Internal error: logical hint operator should have been removed > during analysis")) > } > def cannotEvaluateExpressionError(expression: Expression): Throwable = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot evaluate expression: $expression")) > } > def cannotGenerateCodeForExpressionError(expression: Expression): Throwable > = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot generate code for expression: > $expression")) > } > def cannotTerminateGeneratorError(generator: UnresolvedGenerator): > Throwable = { > new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR", > messageParameters = Array(s"Cannot terminate expression: $generator")) > } > def methodNotDeclaredError(name: String): Throwable = { > new SparkNoSuchMethodException(errorClass = "INTERNAL_ERROR", > messageParameters = Array( > s"""A method named "$name" is not declared in any enclosing class nor > any supertype""")) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org