Xiaoju Wu created SPARK-23493: --------------------------------- Summary: insert-into depends on columns order, otherwise incorrect data inserted Key: SPARK-23493 URL: https://issues.apache.org/jira/browse/SPARK-23493 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.1 Reporter: Xiaoju Wu
Seems the issue still exists, here's the test: val data = Seq( (7, "test1", 1.0), (8, "test#test", 0.0), (9, "test3", 0.0) ) import spark.implicits._ val table = "default.tbl" spark .createDataset(data) .toDF("col1", "col2", "col3") .write .partitionBy("col1") .saveAsTable(table) val data2 = Seq( (7, "test2", 1.0), (8, "test#test", 0.0), (9, "test3", 0.0) ) spark .createDataset(data2) .toDF("col1", "col2", "col3") .write .insertInto(table) sql("select * from " + table).show() +-----------+--++---- |col2|col3|col1| +-----------+--++---- |test#test|0.0|8| |test1|1.0|7| |test3|0.0|9| |8|null|0| |9|null|0| |7|null|1| +-----------+--++---- If you try inserting with sql, the issue is the same. val data = Seq( (7, "test1", 1.0), (8, "test#test", 0.0), (9, "test3", 0.0) ) import spark.implicits._ val table = "default.tbl" spark .createDataset(data) .toDF("col1", "col2", "col3") .write .partitionBy("col1") .saveAsTable(table) val data2 = Seq( (7, "test2", 1.0), (8, "test#test", 0.0), (9, "test3", 0.0) ) sql("insert into " + table + " values(7,'test2',1.0)") sql("select * from " + table).show() +---------+----+----+ | col2|col3|col1| +---------+----+----+ |test#test| 0.0| 8| | test1| 1.0| 7| | test3| 0.0| 9| | 7|null| 1| +---------+----+----+ No exception was thrown since I only run insertInto, not together with partitionBy. The data are inserted incorrectly. The issue is related to column order. If I change to partitionBy col3, which is the last column, it works. val data = Seq( (7, "test1", 1.0), (8, "test#test", 0.0), (9, "test3", 0.0) ) import spark.implicits._ val table = "default.tbl" spark .createDataset(data) .toDF("col1", "col2", "col3") .write .partitionBy("col3") .saveAsTable(table) val data2 = Seq( (7, "test2", 1.0), (8, "test#test", 0.0), (9, "test3", 0.0) ) spark .createDataset(data2) .toDF("col1", "col2", "col3") .write .insertInto(table) sql("select * from " + table).show() +------+-------++---- |col1|col2|col3| +------+-------++---- |8|test#test|0.0| |9|test3|0.0| |8|test#test|0.0| |9|test3|0.0| |7|test1|1.0| |7|test2|1.0| +------+-------++---- -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org