RE: Spark JDBC connection - data writing success or failure cases

Mich Talebzadeh Sat, 20 Feb 2016 16:43:39 -0800

Ok as I understand you mean pushing data from Spark to Oracle database via 
JDBC? Correct

There are a number of ways to do so.

Most common way is using Sqoop to get the data from HDFS file or Hive table 
Oracle database. With Spark you can also use that method by storing the data in 
Hive table and using Sqoop to do the job or directly making reference to where 
the data is stored

Sqoop uses JDBC for this work and I believe it delivers data in batch 
transactions configurable with “export.statements.per.transaction”. Have a look 
at sqoop export –help.

With batch transactions depending on the size of the batch you may have a 
partial delivery of data in case of some issues such network failure or running 
out of space on the Oracle schema.

It really boils down to the volume of the data and the way this is going to 
happen say you job runs as a cron and you may do parallel processing with 
multiple connections to Oracle database.

In general it should work and much like what we see loading data from HFDS file 
to Oracle it should follow JDBC protocols.

One interesting concept that I would like to try is loading data from RDD -> DF 
-> temporary table in Spark – pushing data to Oracle DB via JDBC. I have done 
this other way round no problem.

Like any load you are effectively doing an ETL from Spark to Oracle and you are 
better off loading data to Oracle staging table first and once all gone through 
and you have checked the job, push data from the staging table to main table in 
Oracle to reduce the risk of failure.

HTH

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.

From: Divya Gehlot [mailto:divya.htco...@gmail.com] 
Sent: 21 February 2016 00:09
To: Mich Talebzadeh <m...@peridale.co.uk>
Cc: user @spark <user@spark.apache.org>; Russell Jurney 
<russell.jur...@gmail.com>; Jörn Franke <jornfra...@gmail.com>
Subject: RE: Spark JDBC connection - data writing success or failure cases

Thanks for the input everyone .
What I am trying to understand is if I use oracle to store my data after Spark 
job processing.
And if any spark job fails half the way.
What happens then..
Does rollback happens or we have to programatically  handle this kind of 
situation in spark job itself?
How transaction are being handled n spark to oracle storage ?
My apologies for such a naive question .
Thanks,
Divya 

agreed

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.

From: Russell Jurney [mailto:russell.jur...@gmail.com 
<mailto:russell.jur...@gmail.com> ] 
Sent: 19 February 2016 16:49
To: Jörn Franke <jornfra...@gmail.com <mailto:jornfra...@gmail.com> >
Cc: Divya Gehlot <divya.htco...@gmail.com <mailto:divya.htco...@gmail.com> >; 
user @spark <user@spark.apache.org <mailto:user@spark.apache.org> >
Subject: Re: Spark JDBC connection - data writing success or failure cases

Oracle is a perfectly reasonable endpoint for publishing data processed in 
Spark. I've got to assume he's using it that way and not as a stand in for HDFS?

On Friday, February 19, 2016, Jörn Franke <jornfra...@gmail.com 
<mailto:jornfra...@gmail.com> > wrote:

Generally oracle db should not be used as a storage layer for spark due to 
performance reasons. You should consider HDFS. This will help you also with 
fault - tolerance.

> On 19 Feb 2016, at 03:35, Divya Gehlot <divya.htco...@gmail.com 
> <mailto:divya.htco...@gmail.com> > wrote:
>
> Hi,
> I am a Spark job which connects to RDBMS (in mycase its Oracle).
> How can we check that complete data writing is successful?
> Can I use commit in case of success or rollback in case of failure ?
>
>
>
> Thanks,
> Divya

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
<mailto:user-unsubscr...@spark.apache.org> 
For additional commands, e-mail: user-h...@spark.apache.org 
<mailto:user-h...@spark.apache.org> 

-- 

Russell Jurney twitter.com/rjurney <http://twitter.com/rjurney>  
russell.jur...@gmail.com <mailto:russell.jur...@gmail.com>  relato.io 
<http://relato.io>

RE: Spark JDBC connection - data writing success or failure cases

Reply via email to