Hi Mich,

Thank you for the explanation, that makes sense, and is helpful for me to 
understand the bigger picture between Spark/RDBMS.

Happy to know I’m already following best practice.

Cheers,

Jake

From: Mich Talebzadeh <mich.talebza...@gmail.com>
Date: Monday, August 21, 2017 at 6:44 PM
To: Jake Russ <jr...@bloomintelligence.com>
Cc: "user@spark.apache.org" <user@spark.apache.org>
Subject: Re: Update MySQL table via Spark/SparkR?

Hi Jake,
This is an issue across all RDBMs including Oracle etc. When you are updating 
you have to commit or roll back in RDBMS itself and I am not aware of Spark 
doing that.
The staging table is a safer method as it follows ETL type approach. You create 
new data in the staging table in RDBMS and do the DML in the RDBMS itself where 
you can control commit or rollback. That is the way I would do it. A simple 
shell script can do both.
HTH



Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 21 August 2017 at 15:50, Jake Russ 
<jr...@bloomintelligence.com<mailto:jr...@bloomintelligence.com>> wrote:
Hi everyone,

I’m currently using SparkR to read data from a MySQL database, perform some 
calculations, and then write the results back to MySQL. Is it still true that 
Spark does not support UPDATE queries via JDBC? I’ve seen many posts on the 
internet that Spark’s DataFrameWriter does not support UPDATE queries via 
JDBC<https://issues.apache.org/jira/browse/SPARK-19335>. It will only “append” 
or “overwrite” to existing tables. The best advice I’ve found so far, for 
performing this update, is to write to a staging table in 
MySQL<https://stackoverflow.com/questions/34643200/spark-dataframes-upsert-to-postgres-table>
 and then perform the UPDATE query on the MySQL side.

Ideally, I’d like to handle the update during the write operation. Has anyone 
else encountered this limitation and have a better solution?

Thank you,

Jake

Reply via email to