Xiaoju Wu created SPARK-23493:
---------------------------------

             Summary: insert-into depends on columns order, otherwise incorrect 
data inserted
                 Key: SPARK-23493
                 URL: https://issues.apache.org/jira/browse/SPARK-23493
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.1
            Reporter: Xiaoju Wu


Seems the issue still exists, here's the test:

val data = Seq(
(7, "test1", 1.0),
(8, "test#test", 0.0),
(9, "test3", 0.0)
)
import spark.implicits._

val table = "default.tbl"
spark
.createDataset(data)
.toDF("col1", "col2", "col3")
.write
.partitionBy("col1")
.saveAsTable(table)

val data2 = Seq(
(7, "test2", 1.0),
(8, "test#test", 0.0),
(9, "test3", 0.0)
)

 spark
.createDataset(data2)
.toDF("col1", "col2", "col3")
 .write
.insertInto(table)

sql("select * from " + table).show()

+-----------+--++----
|col2|col3|col1|

+-----------+--++----
|test#test|0.0|8|
|test1|1.0|7|
|test3|0.0|9|
|8|null|0|
|9|null|0|
|7|null|1|

+-----------+--++----

 

If you try inserting with sql, the issue is the same.

val data = Seq(
 (7, "test1", 1.0),
 (8, "test#test", 0.0),
 (9, "test3", 0.0)
 )
 import spark.implicits._

 val table = "default.tbl"
 spark
 .createDataset(data)
 .toDF("col1", "col2", "col3")
 .write
 .partitionBy("col1")
 .saveAsTable(table)

 val data2 = Seq(
 (7, "test2", 1.0),
 (8, "test#test", 0.0),
 (9, "test3", 0.0)
 )

 sql("insert into " + table + " values(7,'test2',1.0)")
 sql("select * from " + table).show()

+---------+----+----+
| col2|col3|col1|
+---------+----+----+
|test#test| 0.0| 8|
| test1| 1.0| 7|
| test3| 0.0| 9|
| 7|null| 1|
+---------+----+----+

No exception was thrown since I only run insertInto, not together with 
partitionBy. The data are inserted incorrectly. The issue is related to column 
order. If I change to partitionBy col3, which is the last column, it works.

val data = Seq(
(7, "test1", 1.0),
(8, "test#test", 0.0),
(9, "test3", 0.0)
)
import spark.implicits._

val table = "default.tbl"
spark
.createDataset(data)
.toDF("col1", "col2", "col3")
.write
.partitionBy("col3")
.saveAsTable(table)

val data2 = Seq(
(7, "test2", 1.0),
(8, "test#test", 0.0),
(9, "test3", 0.0)
)

spark
.createDataset(data2)
.toDF("col1", "col2", "col3")
.write
.insertInto(table)

sql("select * from " + table).show()

+------+-------++----
|col1|col2|col3|

+------+-------++----
|8|test#test|0.0|
|9|test3|0.0|
|8|test#test|0.0|
|9|test3|0.0|
|7|test1|1.0|
|7|test2|1.0|

+------+-------++----



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to