Shankar Koirala created SPARK-32147: ---------------------------------------
Summary: Spark: PartitionBy changing the columns value Key: SPARK-32147 URL: https://issues.apache.org/jira/browse/SPARK-32147 Project: Spark Issue Type: Bug Components: Spark Core, Spark Shell Affects Versions: 3.0.0 Reporter: Shankar Koirala While saving dataframe as parquet or csv with partitionBy column having 'f' and 'd' with numbers are changing the values. Below is the example {code:java} scala> val df = Seq( | ("9q", 1), | ("3k", 2), | ("6f", 3), | ("7f", 4), | ("7d", 5) | ).toDF("value", "id") df: org.apache.spark.sql.DataFrame = [value: string, id: int] scala> df.show(false) +-----+---+ |value|id | +-----+---+ | 9q | 1 | | 3k | 2 | | 6f | 3 | | 7f | 4 | | 7d | 5 | +-----+---+ scala> df.write.partitionBy("value").mode(SaveMode.Overwrite).parquet("tmp_parquet") scala> spark.read.parquet("tmp_parquet").show(false) +---+-----+ |id |value| +---+-----+ |5 | 7.0 | |3 | 6.0 | |2 | 3k | |4 | 7.0 | |1 | 9q | +---+-----+ {code} Same with the other format too, Is this a bug or is it normal. Taken from [SO|[https://stackoverflow.com/questions/62671684/spark-incorrectly-intepret-partition-name-ending-with-d-or-f-as-number-when]] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org