Hi Mich, Thank you for the explanation, that makes sense, and is helpful for me to understand the bigger picture between Spark/RDBMS.
Happy to know I’m already following best practice. Cheers, Jake From: Mich Talebzadeh <mich.talebza...@gmail.com> Date: Monday, August 21, 2017 at 6:44 PM To: Jake Russ <jr...@bloomintelligence.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Subject: Re: Update MySQL table via Spark/SparkR? Hi Jake, This is an issue across all RDBMs including Oracle etc. When you are updating you have to commit or roll back in RDBMS itself and I am not aware of Spark doing that. The staging table is a safer method as it follows ETL type approach. You create new data in the staging table in RDBMS and do the DML in the RDBMS itself where you can control commit or rollback. That is the way I would do it. A simple shell script can do both. HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 21 August 2017 at 15:50, Jake Russ <jr...@bloomintelligence.com<mailto:jr...@bloomintelligence.com>> wrote: Hi everyone, I’m currently using SparkR to read data from a MySQL database, perform some calculations, and then write the results back to MySQL. Is it still true that Spark does not support UPDATE queries via JDBC? I’ve seen many posts on the internet that Spark’s DataFrameWriter does not support UPDATE queries via JDBC<https://issues.apache.org/jira/browse/SPARK-19335>. It will only “append” or “overwrite” to existing tables. The best advice I’ve found so far, for performing this update, is to write to a staging table in MySQL<https://stackoverflow.com/questions/34643200/spark-dataframes-upsert-to-postgres-table> and then perform the UPDATE query on the MySQL side. Ideally, I’d like to handle the update during the write operation. Has anyone else encountered this limitation and have a better solution? Thank you, Jake