Maxim Gekk created SPARK-34314: ---------------------------------- Summary: Wrong discovered partition value Key: SPARK-34314 URL: https://issues.apache.org/jira/browse/SPARK-34314 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk
The example below portraits the issue: {code:scala} val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part") df.write .partitionBy("part") .format("parquet") .save(path) val readback = spark.read.parquet(path) readback.printSchema() readback.show(false) {code} It write the partition value as string: {code} /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d ├── _SUCCESS ├── part=-0 │ └── part-00001-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet └── part=AA └── part-00000-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet {code} *"-0"* and "AA". but when Spark reads data back, it transforms "-0" to "0" {code} root |-- id: integer (nullable = true) |-- part: string (nullable = true) +---+----+ |id |part| +---+----+ |0 |AA | |1 |0 | +---+----+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org