[jira] [Commented] (SPARK-42258) pyspark.sql.functions should not expose typing.cast
[ https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696237#comment-17696237 ] Apache Spark commented on SPARK-42258: -- User 'FurcyPin' has created a pull request for this issue: https://github.com/apache/spark/pull/40271 > pyspark.sql.functions should not expose typing.cast > --- > > Key: SPARK-42258 > URL: https://issues.apache.org/jira/browse/SPARK-42258 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.1 >Reporter: Furcy Pin >Priority: Minor > > In pyspark, the `pyspark.sql.functions` modules imports and exposes the > method `typing.cast`. > This may lead to errors from users that can be hard to spot. > *Example* > It took me a few minutes to understand why the following code: > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql import functions as f > spark = SparkSession.builder.getOrCreate() > df = spark.sql("""SELECT 1 as a""") > df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema() {code} > which executes without any problem, gives the following result: > > > {code:java} > root > |-- a: integer (nullable = false){code} > This is because `f.cast` here calls `typing.cast, and the correct syntax is: > {code:java} > df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code} > > which indeed gives: > {code:java} > root > |-- a: string (nullable = false) {code} > *Suggestion of solution* > Option 1: The methods imported in the module `pyspark.sql.functions` could be > obfuscated to prevent this. For instance: > {code:java} > from typing import cast as _cast{code} > Option 2: only import `typing` and replace all occurrences of `cast` with > `typing.cast` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42258) pyspark.sql.functions should not expose typing.cast
[ https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688191#comment-17688191 ] Hyukjin Kwon commented on SPARK-42258: -- either way is fine to me > pyspark.sql.functions should not expose typing.cast > --- > > Key: SPARK-42258 > URL: https://issues.apache.org/jira/browse/SPARK-42258 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.1 >Reporter: Furcy Pin >Priority: Minor > > In pyspark, the `pyspark.sql.functions` modules imports and exposes the > method `typing.cast`. > This may lead to errors from users that can be hard to spot. > *Example* > It took me a few minutes to understand why the following code: > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql import functions as f > spark = SparkSession.builder.getOrCreate() > df = spark.sql("""SELECT 1 as a""") > df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema() {code} > which executes without any problem, gives the following result: > > > {code:java} > root > |-- a: integer (nullable = false){code} > This is because `f.cast` here calls `typing.cast, and the correct syntax is: > {code:java} > df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code} > > which indeed gives: > {code:java} > root > |-- a: string (nullable = false) {code} > *Suggestion of solution* > Option 1: The methods imported in the module `pyspark.sql.functions` could be > obfuscated to prevent this. For instance: > {code:java} > from typing import cast as _cast{code} > Option 2: only import `typing` and replace all occurrences of `cast` with > `typing.cast` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42258) pyspark.sql.functions should not expose typing.cast
[ https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688120#comment-17688120 ] Furcy Pin commented on SPARK-42258: --- Sure, I can do that. Do you have any preference between option 1 and 2 ? I believe 2 is cleaner. > pyspark.sql.functions should not expose typing.cast > --- > > Key: SPARK-42258 > URL: https://issues.apache.org/jira/browse/SPARK-42258 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.1 >Reporter: Furcy Pin >Priority: Minor > > In pyspark, the `pyspark.sql.functions` modules imports and exposes the > method `typing.cast`. > This may lead to errors from users that can be hard to spot. > *Example* > It took me a few minutes to understand why the following code: > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql import functions as f > spark = SparkSession.builder.getOrCreate() > df = spark.sql("""SELECT 1 as a""") > df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema() {code} > which executes without any problem, gives the following result: > > > {code:java} > root > |-- a: integer (nullable = false){code} > This is because `f.cast` here calls `typing.cast, and the correct syntax is: > {code:java} > df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code} > > which indeed gives: > {code:java} > root > |-- a: string (nullable = false) {code} > *Suggestion of solution* > Option 1: The methods imported in the module `pyspark.sql.functions` could be > obfuscated to prevent this. For instance: > {code:java} > from typing import cast as _cast{code} > Option 2: only import `typing` and replace all occurrences of `cast` with > `typing.cast` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42258) pyspark.sql.functions should not expose typing.cast
[ https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687757#comment-17687757 ] Hyukjin Kwon commented on SPARK-42258: -- Good point. Are you interested in submitting a PR? > pyspark.sql.functions should not expose typing.cast > --- > > Key: SPARK-42258 > URL: https://issues.apache.org/jira/browse/SPARK-42258 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.1 >Reporter: Furcy Pin >Priority: Minor > > In pyspark, the `pyspark.sql.functions` modules imports and exposes the > method `typing.cast`. > This may lead to errors from users that can be hard to spot. > *Example* > It took me a few minutes to understand why the following code: > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql import functions as f > spark = SparkSession.builder.getOrCreate() > df = spark.sql("""SELECT 1 as a""") > df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema() {code} > which executes without any problem, gives the following result: > > > {code:java} > root > |-- a: integer (nullable = false){code} > This is because `f.cast` here calls `typing.cast, and the correct syntax is: > {code:java} > df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code} > > which indeed gives: > {code:java} > root > |-- a: string (nullable = false) {code} > *Suggestion of solution* > Option 1: The methods imported in the module `pyspark.sql.functions` could be > obfuscated to prevent this. For instance: > {code:java} > from typing import cast as _cast{code} > Option 2: only import `typing` and replace all occurrences of `cast` with > `typing.cast` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org