Ok as I understand you mean pushing data from Spark to Oracle database via JDBC? Correct
There are a number of ways to do so. Most common way is using Sqoop to get the data from HDFS file or Hive table Oracle database. With Spark you can also use that method by storing the data in Hive table and using Sqoop to do the job or directly making reference to where the data is stored Sqoop uses JDBC for this work and I believe it delivers data in batch transactions configurable with “export.statements.per.transaction”. Have a look at sqoop export –help. With batch transactions depending on the size of the batch you may have a partial delivery of data in case of some issues such network failure or running out of space on the Oracle schema. It really boils down to the volume of the data and the way this is going to happen say you job runs as a cron and you may do parallel processing with multiple connections to Oracle database. In general it should work and much like what we see loading data from HFDS file to Oracle it should follow JDBC protocols. One interesting concept that I would like to try is loading data from RDD -> DF -> temporary table in Spark – pushing data to Oracle DB via JDBC. I have done this other way round no problem. Like any load you are effectively doing an ETL from Spark to Oracle and you are better off loading data to Oracle staging table first and once all gone through and you have checked the job, push data from the staging table to main table in Oracle to reduce the risk of failure. HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Divya Gehlot [mailto:divya.htco...@gmail.com] Sent: 21 February 2016 00:09 To: Mich Talebzadeh <m...@peridale.co.uk> Cc: user @spark <user@spark.apache.org>; Russell Jurney <russell.jur...@gmail.com>; Jörn Franke <jornfra...@gmail.com> Subject: RE: Spark JDBC connection - data writing success or failure cases Thanks for the input everyone . What I am trying to understand is if I use oracle to store my data after Spark job processing. And if any spark job fails half the way. What happens then.. Does rollback happens or we have to programatically handle this kind of situation in spark job itself? How transaction are being handled n spark to oracle storage ? My apologies for such a naive question . Thanks, Divya agreed Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Russell Jurney [mailto:russell.jur...@gmail.com <mailto:russell.jur...@gmail.com> ] Sent: 19 February 2016 16:49 To: Jörn Franke <jornfra...@gmail.com <mailto:jornfra...@gmail.com> > Cc: Divya Gehlot <divya.htco...@gmail.com <mailto:divya.htco...@gmail.com> >; user @spark <user@spark.apache.org <mailto:user@spark.apache.org> > Subject: Re: Spark JDBC connection - data writing success or failure cases Oracle is a perfectly reasonable endpoint for publishing data processed in Spark. I've got to assume he's using it that way and not as a stand in for HDFS? On Friday, February 19, 2016, Jörn Franke <jornfra...@gmail.com <mailto:jornfra...@gmail.com> > wrote: Generally oracle db should not be used as a storage layer for spark due to performance reasons. You should consider HDFS. This will help you also with fault - tolerance. > On 19 Feb 2016, at 03:35, Divya Gehlot <divya.htco...@gmail.com > <mailto:divya.htco...@gmail.com> > wrote: > > Hi, > I am a Spark job which connects to RDBMS (in mycase its Oracle). > How can we check that complete data writing is successful? > Can I use commit in case of success or rollback in case of failure ? > > > > Thanks, > Divya --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org <mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org <mailto:user-h...@spark.apache.org> -- Russell Jurney twitter.com/rjurney <http://twitter.com/rjurney> russell.jur...@gmail.com <mailto:russell.jur...@gmail.com> relato.io <http://relato.io>