Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-27 Thread kuassi . mensah

Mich,

That's right, referring to you guys.

Cheers, Kuassi

On 8/27/20 9:27 AM, Mich Talebzadeh wrote:

Thanks Kuassi,

I presume you mean Spark DEV team by "they are using ... "

cheers,

Mich



LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw 
/




*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.




On Thu, 27 Aug 2020 at 17:11, > wrote:


According to our dev team.


From the error it is evident that they are using a jdbc jar which
does not support setting tns_admin in URL.
They might have some old jar in class-path which is being used
instead of 18.3 jar.
You can ask them to use either full URL or tns alias format URL
with tns_admin path set as either connection property or system
property.

Regards, Kuassi
On 8/26/20 2:11 PM, Mich Talebzadeh wrote:

And this is a test using Oracle supplied JAVA
script DataSourceSample.java with slight amendment for
login/password and table. it connects ok

hduser@rhes76: /home/hduser/dba/bin/ADW/src> javac -classpath
./ojdbc8.jar:. DataSourceSample.java
hduser@rhes76: /home/hduser/dba/bin/ADW/src> java -classpath
./ojdbc8.jar:. DataSourceSample
AArray = [B@57d5872c
AArray = [B@667a738
AArray = [B@2145433b
Driver Name: Oracle JDBC driver
Driver Version: 18.3.0.0.0
Default Row Prefetch Value is: 20
Database Username is: SCRATCHPAD

DATETAKEN WEIGHT
-
2017-09-07 07:22:09 74.7
2017-09-08 07:26:18 74.8
2017-09-09 07:15:53 75
2017-09-10 07:53:30 75.9
2017-09-11 07:21:49 75.8
2017-09-12 07:31:27 75.6
2017-09-26 07:11:26 75.4
2017-09-27 07:22:48 75.6
2017-09-28 07:15:52 75.4
2017-09-29 07:30:40 74.9


Regards,


LinkedIn

/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

/



*Disclaimer:* Use it at your own risk.Any and all responsibility
for any loss, damage or destruction of data or any other property
which may arise from relying on this email's technical content is
explicitly disclaimed. The author will in no case be liable for
any monetary damages arising from such loss, damage or destruction.



On Wed, 26 Aug 2020 at 21:58, Mich Talebzadeh
mailto:mich.talebza...@gmail.com>> wrote:

Hi Kuassi,

This is the error. Only test running on local mode

scala> val driverName = "oracle.jdbc.OracleDriver"
driverName: String = oracle.jdbc.OracleDriver

scala> var url =

"jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"
url: String =
jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess
scala> var _username = "scratchpad"
_username: String = scratchpad
scala> var _password = "xx" -- no special characters
_password: String = xxx
scala> var _dbschema = "SCRATCHPAD"
_dbschema: String = SCRATCHPAD
scala> var _dbtable = "LL_18201960"
_dbtable: String = LL_18201960
scala> var e:SQLException = null
e: java.sql.SQLException = null
scala> var connection:Connection = null
connection: java.sql.Connection = null
scala> var metadata:DatabaseMetaData = null
metadata: java.sql.DatabaseMetaData = null
scala> val prop = new java.util.Properties
prop: java.util.Properties = {}
scala> prop.setProperty("user", _username)
res1: Object = null
scala> prop.setProperty("password",_password)
res2: Object = null
scala> // Check Oracle is accessible
*scala> try {
*
*   |       connection = DriverManager.getConnection(url,
_username, _password)*
*   | } catch {*
*   |   case e: SQLException => e.printStackTrace*
*   |   connection.close()*
*   | }*
*java.sql.SQLRecoverableException: IO Error: Invalid
connection string format, a valid format is: "host:port:sid"*
      at
oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
      at

oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)
      at

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-27 Thread kuassi . mensah

According to our dev team.

From the error it is evident that they are using a jdbc jar which does 
not support setting tns_admin in URL.
They might have some old jar in class-path which is being used instead 
of 18.3 jar.
You can ask them to use either full URL or tns alias format URL with 
tns_admin path set as either connection property or system property.

Regards, Kuassi

On 8/26/20 2:11 PM, Mich Talebzadeh wrote:
And this is a test using Oracle supplied JAVA 
script DataSourceSample.java with slight amendment for login/password 
and table. it connects ok


hduser@rhes76: /home/hduser/dba/bin/ADW/src> javac -classpath 
./ojdbc8.jar:. DataSourceSample.java
hduser@rhes76: /home/hduser/dba/bin/ADW/src> java -classpath 
./ojdbc8.jar:. DataSourceSample

AArray = [B@57d5872c
AArray = [B@667a738
AArray = [B@2145433b
Driver Name: Oracle JDBC driver
Driver Version: 18.3.0.0.0
Default Row Prefetch Value is: 20
Database Username is: SCRATCHPAD

DATETAKEN WEIGHT
-
2017-09-07 07:22:09 74.7
2017-09-08 07:26:18 74.8
2017-09-09 07:15:53 75
2017-09-10 07:53:30 75.9
2017-09-11 07:21:49 75.8
2017-09-12 07:31:27 75.6
2017-09-26 07:11:26 75.4
2017-09-27 07:22:48 75.6
2017-09-28 07:15:52 75.4
2017-09-29 07:30:40 74.9


Regards,


LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw 
/




*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.




On Wed, 26 Aug 2020 at 21:58, Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:


Hi Kuassi,

This is the error. Only test running on local mode

scala> val driverName = "oracle.jdbc.OracleDriver"
driverName: String = oracle.jdbc.OracleDriver

scala> var url =
"jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"
url: String =
jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess
scala> var _username = "scratchpad"
_username: String = scratchpad
scala> var _password = "xx" -- no special characters
_password: String = xxx
scala> var _dbschema = "SCRATCHPAD"
_dbschema: String = SCRATCHPAD
scala> var _dbtable = "LL_18201960"
_dbtable: String = LL_18201960
scala> var e:SQLException = null
e: java.sql.SQLException = null
scala> var connection:Connection = null
connection: java.sql.Connection = null
scala> var metadata:DatabaseMetaData = null
metadata: java.sql.DatabaseMetaData = null
scala> val prop = new java.util.Properties
prop: java.util.Properties = {}
scala> prop.setProperty("user", _username)
res1: Object = null
scala> prop.setProperty("password",_password)
res2: Object = null
scala> // Check Oracle is accessible
*scala> try {
*
*     |      connection = DriverManager.getConnection(url,
_username, _password)*
*     | } catch {*
*     |  case e: SQLException => e.printStackTrace*
*     |  connection.close()*
*     | }*
*java.sql.SQLRecoverableException: IO Error: Invalid connection
string format, a valid format is: "host:port:sid"*
        at
oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
        at
oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)
        at
oracle.jdbc.driver.T4CConnection.(T4CConnection.java:254)
        at

oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
        at
oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)
        at
java.sql.DriverManager.getConnection(DriverManager.java:664)

Is this related to Oracle or Spark? Do I need to set up another
connection parameter etc?

Cheers


*Disclaimer:* Use it at your own risk.Any and all responsibility
for any loss, damage or destruction of data or any other property
which may arise from relying on this email's technical content is
explicitly disclaimed. The author will in no case be liable for
any monetary damages arising from such loss, damage or destruction.



On Wed, 26 Aug 2020 at 21:09, mailto:kuassi.men...@oracle.com>> wrote:

Mich,

All looks fine.
Perhaps some special chars in username or password?


it is recommended not to use such characters like '@', '.' in
your password.

Best, Kuassi
On 8/26/20 12:52 PM, Mich Talebzadeh wrote:

Thanks Kuassi.

This is the version of jar file that work OK with JDBC
connection via JAVA to 

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread kuassi . mensah
Fwiw here are our write-ups on Java connectivity to Database Cloud 
Services: 
https://www.oracle.com/database/technologies/appdev/jdbc-db-cloud.html


Kuassi

On 8/26/20 1:50 PM, Mich Talebzadeh wrote:

Thanks Jorn,

Only running in REPL in local mode

This works fine connecting with ojdbc6.jar to Oracle 12c.

cheers



LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw 
/




*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.




On Wed, 26 Aug 2020 at 21:39, Jörn Franke > wrote:


Is the directory available on all nodes ?


Am 26.08.2020 um 22:08 schrieb kuassi.men...@oracle.com
:



Mich,

All looks fine.
Perhaps some special chars in username or password?


it is recommended not to use such characters like '@', '.' in
your password.

Best, Kuassi
On 8/26/20 12:52 PM, Mich Talebzadeh wrote:

Thanks Kuassi.

This is the version of jar file that work OK with JDBC
connection via JAVA to ADW

unzip -p ojdbc8.jar META-INF/MANIFEST.MF
Manifest-Version: 1.0
Implementation-Title: JDBC
*Implementation-Version: 18.3.0.0.0*
sealed: true
Specification-Vendor: Sun Microsystems Inc.
Specification-Title: JDBC
Class-Path: oraclepki.jar
Implementation-Vendor: Oracle Corporation
Main-Class: oracle.jdbc.OracleDriver
Ant-Version: Apache Ant 1.7.1
Repository-Id: JAVAVM_18.1.0.0.0_LINUX.X64_180620
Created-By: 25.171-b11 (Oracle Corporation)
Specification-Version: 4.0

And this the setting for TNS_ADMIN

e*cho ${TNS_ADMIN}*
*/home/hduser/dba/bin/ADW/DBAccess*

hduser@rhes76: /home/hduser/dba/bin/ADW/DBAccess> *cat
ojdbc.properties*
*# Connection property while using Oracle wallets.*

*oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))*
*# FOLLOW THESE STEPS FOR USING JKS*
*# (1) Uncomment the following properties to use JKS.*
*# (2) Comment out the oracle.net.wallet_location property above*
*# (3) Set the correct password for both trustStorePassword and
keyStorePassword.*
*# It's the password you specified when downloading the wallet
from OCI Console or the Service Console.*
*#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks*
*#javax.net.ssl.trustStorePassword=*
*#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks*
*#javax.net.ssl.keyStorePassword=hduser@rhes76:
/home/hduser/dba/bin/ADW/DBAccess>*

Regards,

Mich

LinkedIn

/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

/



*Disclaimer:* Use it at your own risk.Any and all responsibility
for any loss, damage or destruction of data or any other
property which may arise from relying on this
email's technical content is explicitly disclaimed. The author
will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On Wed, 26 Aug 2020 at 20:16, mailto:kuassi.men...@oracle.com>> wrote:

Hi,

From which release is the ojdbc8.jar from? 12c, 18c or 19c?
I'd recommend ojdbc8.jar from the latest release.
One more thing to pay attention to is the content of the
ojdbc.properties file (part of the unzipped wallet)
Make sure that ojdbc.properties file has been configured to
use Oracle Wallet, as follows (i.e., anything related to JKS
commented out)


/oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))//
//#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks//
//#javax.net.ssl.trustStorePassword=//
//#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks//
//#javax.net.ssl.keyStorePassword=/

Alternatively, if you want to use JKS< then you need to
comment out the firts line and un-comment the other lines
and set the values.

Kuassi

On 8/26/20 11:58 AM, Mich Talebzadeh wrote:

Hi,

The connection from Spark to Oracle 12c etc are well
established using ojdb6.jar.

I am attempting to connect to Oracle Autonomous Data
warehouse (ADW) version

*Oracle Database 19c 

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread kuassi . mensah

Mich,

All looks fine.
Perhaps some special chars in username or password?

it is recommended not to use such characters like '@', '.' in your 
password.

Best, Kuassi

On 8/26/20 12:52 PM, Mich Talebzadeh wrote:

Thanks Kuassi.

This is the version of jar file that work OK with JDBC connection via 
JAVA to ADW


unzip -p ojdbc8.jar META-INF/MANIFEST.MF
Manifest-Version: 1.0
Implementation-Title: JDBC
*Implementation-Version: 18.3.0.0.0*
sealed: true
Specification-Vendor: Sun Microsystems Inc.
Specification-Title: JDBC
Class-Path: oraclepki.jar
Implementation-Vendor: Oracle Corporation
Main-Class: oracle.jdbc.OracleDriver
Ant-Version: Apache Ant 1.7.1
Repository-Id: JAVAVM_18.1.0.0.0_LINUX.X64_180620
Created-By: 25.171-b11 (Oracle Corporation)
Specification-Version: 4.0

And this the setting for TNS_ADMIN

e*cho ${TNS_ADMIN}*
*/home/hduser/dba/bin/ADW/DBAccess*

hduser@rhes76: /home/hduser/dba/bin/ADW/DBAccess> *cat ojdbc.properties*
*# Connection property while using Oracle wallets.*
*oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))*
*# FOLLOW THESE STEPS FOR USING JKS*
*# (1) Uncomment the following properties to use JKS.*
*# (2) Comment out the oracle.net.wallet_location property above*
*# (3) Set the correct password for both trustStorePassword and 
keyStorePassword.*
*# It's the password you specified when downloading the wallet from 
OCI Console or the Service Console.*

*#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks*
*#javax.net.ssl.trustStorePassword=*
*#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks*
*#javax.net.ssl.keyStorePassword=hduser@rhes76: 
/home/hduser/dba/bin/ADW/DBAccess>*


Regards,

Mich

LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw 
/




*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.




On Wed, 26 Aug 2020 at 20:16, > wrote:


Hi,

From which release is the ojdbc8.jar from? 12c, 18c or 19c? I'd
recommend ojdbc8.jar from the latest release.
One more thing to pay attention to is the content of the
ojdbc.properties file (part of the unzipped wallet)
Make sure that ojdbc.properties file has been configured to use
Oracle Wallet, as follows (i.e., anything related to JKS commented
out)


/oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))//
//#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks//
//#javax.net.ssl.trustStorePassword=//
//#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks//
//#javax.net.ssl.keyStorePassword=/

Alternatively, if you want to use JKS< then you need to comment
out the firts line and un-comment the other lines and set the values.

Kuassi

On 8/26/20 11:58 AM, Mich Talebzadeh wrote:

Hi,

The connection from Spark to Oracle 12c etc are well established
using ojdb6.jar.

I am attempting to connect to Oracle Autonomous Data warehouse
(ADW) version

*Oracle Database 19c Enterprise Edition Release 19.0.0.0.0*

Oracle document suggest using ojdbc8.jar


 to
connect to the database with the following URL format using
Oracle Wallet

"jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"

This works fine through JAVA itself but throws an error with
Spark version 2.4.3.

The connection string is defined as follows

val url =
"jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"

where DBAcess directory is the unzipped wallet for
Wallet_mydb.zip as created by ADW connection.

The thing is that this works through normal connection via java
code.using the same URL

So the question is whether there is a dependency in Spark JDBC
connection to the ojdbc.

The error I am getting is:

java.sql.SQLRecoverableException: IO Error: Invalid connection
string format, a valid format is: "host:port:sid"
        at
oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
        at
oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)
        at
oracle.jdbc.driver.T4CConnection.(T4CConnection.java:254)
        at

oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
        at

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread kuassi . mensah

Hi,

From which release is the ojdbc8.jar from? 12c, 18c or 19c? I'd 
recommend ojdbc8.jar from the latest release.
One more thing to pay attention to is the content of the 
ojdbc.properties file (part of the unzipped wallet)
Make sure that ojdbc.properties file has been configured to use Oracle 
Wallet, as follows (i.e., anything related to JKS commented out)


/oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))//
//#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks//
//#javax.net.ssl.trustStorePassword=//
//#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks//
//#javax.net.ssl.keyStorePassword=/

Alternatively, if you want to use JKS< then you need to comment out the 
firts line and un-comment the other lines and set the values.


Kuassi

On 8/26/20 11:58 AM, Mich Talebzadeh wrote:

Hi,

The connection from Spark to Oracle 12c etc are well established using 
ojdb6.jar.


I am attempting to connect to Oracle Autonomous Data warehouse (ADW) 
version


*Oracle Database 19c Enterprise Edition Release 19.0.0.0.0*

Oracle document suggest using ojdbc8.jar 
 to 
connect to the database with the following URL format using Oracle Wallet


"jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"

This works fine through JAVA itself but throws an error with 
Spark version 2.4.3.


The connection string is defined as follows

val url = 
"jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"


where DBAcess directory is the unzipped wallet for Wallet_mydb.zip as 
created by ADW connection.


The thing is that this works through normal connection via java 
code.using the same URL


So the question is whether there is a dependency in Spark JDBC 
connection to the ojdbc.


The error I am getting is:

java.sql.SQLRecoverableException: IO Error: Invalid connection string 
format, a valid format is: "host:port:sid"

        at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
        at 
oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)

        at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:254)
        at 
oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)

        at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)

This Oracle doc 
 
explains the connectivity.


The unzipped wallet has the followiing files

 ls DBAccess/
README cwallet.sso  ewallet.p12  keystore.jks ojdbc.properties  
sqlnet.ora  tnsnames.ora truststore.jks



Thanks

Mich



LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw 
/




*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.




Re: spark architecture question -- Pleas Read

2017-02-05 Thread kuassi mensah
Apology in advance for injecting Oracle product in this discussion but I 
thought it might help address the requirements (as far as I understood 
these).
We are looking into furnishing for Spark a new connector similar to the 
Oracle Datasource for Hadoop,
which 
will implement the Spark DataSource interfaces for Oracle Database.


In summary, it'll allow:

 * allow parallel and direct access to the Oracle database (with option
   to control the number of concurrent connections)
 * introspect the Oracle table then dynamically generate partitions of
   Spark JDBCRDDs based on the split pattern and rewrite Spark SQL
   queries into Oracle SQL queries for each partition. The typical use
   case consists in joining fact data (or Big Data) with master data in
   Oracle.
 * hooks in Oracle JDBC driver for faster type conversions
 * Implement predicate pushdown, partition pruning, column projections
   to the Oracle database, thereby reducing the amount of data to be
   processed on Spark
 * write back to Oracle table (through paralllel insert) the result of
   SparkSQL processing for further mining by traditional BI tools.

You may reach out to me offline for ore details if interested,

Kuassi

On 1/29/2017 3:39 AM, Mich Talebzadeh wrote:

This is classis nothing special about it.

 1. You source is Oracle schema tables
 2. You can use Oracle JDBC connection with DIRECT CONNECT and
parallel processing to read your data from Oracle table into Spark
FP using JDBC. Ensure that you are getting data from Oracle DB at
a time when the DB is not busy and network between your Spark and
Oracle is reasonable. You will be creating multiple connections to
your Oracle database from Spark
 3. Create a DF from RDD and ingest your data into Hive staging
tables. This should be pretty fast. If you are using a recent
version of Spark > 1.5 you can see this in Spark GUI
 4. Once data is ingested into Hive table (frequency Discrete,
Recurring or Cumulative), then you have your source data in Hive
 5. Do your work in Hive staging tables and then your enriched data
will go into Hive enriched tables (different from your staging
tables). You can use Spark to enrich (transform) your data on Hive
staging tables
 6. Then use Spark to send that data into Oracle table. Again bear in
mind that the application has to handle consistency from Big Data
into RDBMS. For example what you are going to do with failed
transactions in Oracle
 7. From my experience you also need some  staging tables in Oracle to
handle inserts from Hive via Spark into Oracle table
 8. Finally run a job in PL/SQL to load Oracle target tables from
Oracle staging tables

Notes:

Oracle columns types are 100% compatible with Spark. For example Spark 
does not recognize CHAR column and that has to be converted into 
VARCHAR or STRING.
Hive does not have the concept of Oracle "WITH CLAUSE" inline table. 
So that script that works in Oracle may not work in Hive. Windowing 
functions should be fine.


I tend to do all this via shell script that gives control at each 
layer and creates alarms.


HTH


1.



2.





Dr Mich Talebzadeh

LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/


http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.



On 29 January 2017 at 10:18, Alex > wrote:


Hi All,

Thanks for your response .. Please find below flow diagram

Please help me out simplifying this architecture using Spark

1) Can i skip step 1 to step 4 and directly store it in spark
if I am storing it in spark where actually it is getting stored
Do i need to retain HAdoop to store data
or can i directly store it in spark and remove hadoop also?

I want to remove informatica for preprocessing and directly load
the files data coming from server to Hadoop/Spark

So My Question is Can i directly load files data to spark ? Then
where exactly the data will get stored.. Do I need to have Spark
installed on Top of HDFS?

2) if I am retaining below architecture Can I store back output
from spark directly to oracle from step 5 to step 7

and will spark way of storing it back to oracle will be better
than using sqoop performance wise
3)Can I use SPark scala UDF to process data from hive and retain
entire architecture

which among the above would be optimal

Inline image 1

On Sat, Jan 28, 2017 at 10:38 PM, Sachin Naik