Hi Jake, There is a another option within the 3rd party projects in the spark database ecosystem that have combined Spark with a DBMS in such a way that DataFrame API has been extended to include UPDATE operations <http://snappydatainc.github.io/snappydata/programming_guide/#create-row-tables-using-api-update-the-contents-of-row-table>. However, in your case you would have to move away from MySQL in order to use this API.
Best, Pierce On Tue, Aug 22, 2017 at 7:54 AM, Jake Russ <jr...@bloomintelligence.com> wrote: > Hi Mich, > > > > Thank you for the explanation, that makes sense, and is helpful for me to > understand the bigger picture between Spark/RDBMS. > > > > Happy to know I’m already following best practice. > > > > Cheers, > > > > Jake > > > > *From: *Mich Talebzadeh <mich.talebza...@gmail.com> > *Date: *Monday, August 21, 2017 at 6:44 PM > *To: *Jake Russ <jr...@bloomintelligence.com> > *Cc: *"user@spark.apache.org" <user@spark.apache.org> > *Subject: *Re: Update MySQL table via Spark/SparkR? > > > > Hi Jake, > > This is an issue across all RDBMs including Oracle etc. When you are > updating you have to commit or roll back in RDBMS itself and I am not aware > of Spark doing that. > > The staging table is a safer method as it follows ETL type approach. You > create new data in the staging table in RDBMS and do the DML in the RDBMS > itself where you can control commit or rollback. That is the way I would do > it. A simple shell script can do both. > > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn > *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > > On 21 August 2017 at 15:50, Jake Russ <jr...@bloomintelligence.com> wrote: > > Hi everyone, > > > > I’m currently using SparkR to read data from a MySQL database, perform > some calculations, and then write the results back to MySQL. Is it still > true that Spark does not support UPDATE queries via JDBC? I’ve seen many > posts on the internet that Spark’s DataFrameWriter does not support > UPDATE queries via JDBC > <https://issues.apache.org/jira/browse/SPARK-19335>. It will only > “append” or “overwrite” to existing tables. The best advice I’ve found so > far, for performing this update, is to write to a staging table in MySQL > <https://stackoverflow.com/questions/34643200/spark-dataframes-upsert-to-postgres-table> > and > then perform the UPDATE query on the MySQL side. > > > > Ideally, I’d like to handle the update during the write operation. Has > anyone else encountered this limitation and have a better solution? > > > > Thank you, > > > > Jake > > >