Sorry, last mail format was not good.
println("Going to talk to mySql")
// Read table from mySQL.
val mysqlDF = spark.read.jdbc(jdbcUrl, table, properties)
println("I am back from mySql")
mysqlDF.show()
// Create a new Dataframe with column 'id' increased to avoid Duplicate
primary keys
val newDF = mysqlDF.select((col("id") + 10).as("id"), col("country"),
col("city"))
newDF.printSchema()
newDF.show()
// Insert records into the table.
newDF.write
.mode(SaveMode.Append)
.jdbc(jdbcUrl, table, properties)
// Write to Hive - This Creates a new table.
newDF.write.saveAsTable("cities")
newDF.show()
Going to talk to mySql
I am back from mySql
+---+--+-+
| id| country| city|
+---+--+-+
| 1| USA|Palo Alto|
| 2|Czech Republic| Brno|
| 3| USA|Sunnyvale|
| 4| null| null|
+---+--+-+
root
|-- id: long (nullable = false)
|-- country: string (nullable = true)
|-- city: string (nullable = true)
+---+--+-+
| id| country| city|
+---+--+-+
| 11| USA|Palo Alto|
| 12|Czech Republic| Brno|
| 13| USA|Sunnyvale|
| 14| null| null|
+---+--+-+
+---+--+-+
| id| country| city|
+---+--+-+
| 11| USA|Palo Alto|
| 12|Czech Republic| Brno|
| 13| USA|Sunnyvale|
| 14| null| null|
| 24| null| null|
| 23| USA|Sunnyvale|
| 22|Czech Republic| Brno|
| 21| USA|Palo Alto|
+---+--+-+
Thanks,
Ravi
From: ryanda...@gmail.com
Sent: Wednesday, August 29, 2018 8:19 PM
To: user@spark.apache.org
Subject: Spark code to write to MySQL and Hive
Hi,
Can anyone help me to understand what is happening with my code ?
I wrote a Spark application to read from a MySQL table [that already has 4
records], Create a new DF by adding 10 to the ID field. Then, I wanted to
write the new DF to MySQL as well as to Hive.
I am surprised to see additional set of records in Hive !! I am not able to
understand how the newDF has records with IDs 21 to 24. I know that a DF is
immutable. If so, how come it has 4 records at one point and 8 records at
later point ?
// Read table from mySQL.
val mysqlDF = spark.read.jdbc(jdbcUrl, table, properties)
println("I am back from mySql")
mysqlDF.show()
// Create a new Dataframe with column 'id' increased to avoid Duplicate
primary keys
val newDF = mysqlDF.select((col("id") + 10).as("id"), col("country"),
col("city"))
newDF.printSchema()
newDF.show()
// Insert records into the MySQL table.
newDF.write
.mode(SaveMode.Append)
.jdbc(jdbcUrl, table, properties)
// Write to Hive - This Creates a new table.
newDF.write.saveAsTable("cities")
newDF.show()
Records already existing in mySql
+---+--+-+
| id| country| city|
+---+--+-+
| 1| USA|Palo Alto|
| 2|Czech Republic| Brno|
| 3| USA|Sunnyvale|
| 4| null| null|
+---+--+-+
root
|-- id: long (nullable = false)
|-- country: string (nullable = true)
|-- city: string (nullable = true)
newDF.show()
+---+--+-+
| id| country| city|
+---+--+-+
| 11| USA|Palo Alto|
| 12|Czech Republic| Brno|
| 13| USA|Sunnyvale|
| 14| null| null|
+---+--+-+
+---+--+-+
| id| country| city|
+---+--+-+
| 11| USA|Palo Alto|
| 12|Czech Republic| Brno|
| 13| USA|Sunnyvale|
| 14| null| null|
| 24| null| null|
| 23| USA|Sunnyvale|
| 22|Czech Republic| Brno|
| 21| USA|Palo Alto|
+---+--+-+
Thanks for you time.
Ravi