RE: Stored proc with spark

2016-02-16 Thread Mich Talebzadeh
One thing to be aware is that you better convert Oracle NUMBER and NUMBER(m,n) 
columns to varchar (--> TO_CHAR()) at source as Spark  will throw overflow 
errors.

 

It is better to user TO_CHAR() in Oracle rather than writing UDF in Spark. UDFs 
in any language are slower compared to the generic functions.

 

HTH

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.

 

 

From: Gaurav Agarwal [mailto:gaurav130...@gmail.com] 
Sent: 16 February 2016 15:52
To: Mich Talebzadeh <m...@peridale.co.uk>
Cc: user <user@spark.apache.org>
Subject: Re: Stored proc with spark

 

Thanks I will try with the options

On Feb 16, 2016 9:15 PM, "Mich Talebzadeh" <m...@peridale.co.uk 
<mailto:m...@peridale.co.uk> > wrote:

You can use JDBC to oracle to get that data from a given table. What Oracle 
stored procedure does anyway? How many tables are involved?

JDBC is pretty neat. In example below I use JDBC to load two Dimension tables 
from Oracle in Spark shell and read the FACT table of 100 million rows from Hive

val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

println ("\nStarted at"); HiveContext.sql("SELECT 
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') 
").collect.foreach(println)

//
var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb"
var _username : String = "sh"
var _password : String = "xx"
//

/Get the FACT table from Hive
//
var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM 
oraclehadoop.sales")

//Get Oracle tables via JDBC

val c = HiveContext.load("jdbc",
Map("url" -> _ORACLEserver,
"dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM 
sh.channels)",
"user" -> _username,
"password" -> _password))

val t = HiveContext.load("jdbc",
Map("url" -> _ORACLEserver,
"dbtable" -> "(SELECT TIME_ID AS TIME_ID, CALENDAR_MONTH_DESC FROM sh.times)",
"user" -> _username,
"password" -> _password))

// Registar three data frames as temporary tables using registerTempTable() call

s.registerTempTable("t_s")
c.registerTempTable("t_c")
t.registerTempTable("t_t")
//
var sqltext : String = ""
sqltext = """
SELECT rs.Month, rs.SalesChannel, round(TotalSales,2)
FROM
(
SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC AS SalesChannel, 
SUM(t_s.AMOUNT_SOLD) AS TotalSales
FROM t_s, t_t, t_c
WHERE t_s.TIME_ID = t_t.TIME_ID
AND   t_s.CHANNEL_ID = t_c.CHANNEL_ID
GROUP BY t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC
ORDER by t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC
) rs
LIMIT 10
"""
HiveContext.sql(sqltext).collect.foreach(println)
println ("\nFinished at"); HiveContext.sql("SELECT 
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') 
").collect.foreach(println)

sys.exit()

 

HTH

--

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Cloud Technology Partners Ltd, 
its subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Cloud Technology partners Ltd, its subsidiaries nor their 
employees accept any responsibility.

On 16/02/2016 09:04, Gaurav Agarwal wrote:

Hi
Can I load the data into spark from oracle storedproc

Thanks

 

 

 



Re: Stored proc with spark

2016-02-16 Thread Gaurav Agarwal
Thanks I will try with the options
On Feb 16, 2016 9:15 PM, "Mich Talebzadeh"  wrote:

> You can use JDBC to oracle to get that data from a given table. What
> Oracle stored procedure does anyway? How many tables are involved?
>
> JDBC is pretty neat. In example below I use JDBC to load two
> Dimension tables from Oracle in Spark shell and read the FACT table of 100
> million rows from Hive
>
> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>
> println ("\nStarted at"); HiveContext.sql("SELECT
> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
> ").collect.foreach(println)
>
> //
> var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb"
> var _username : String = "sh"
> var _password : String = "xx"
> //
>
> /Get the FACT table from Hive
> //
> var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM
> oraclehadoop.sales")
>
> //Get Oracle tables via JDBC
>
> val c = HiveContext.load("jdbc",
> Map("url" -> _ORACLEserver,
> "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM
> sh.channels)",
> "user" -> _username,
> "password" -> _password))
>
> val t = HiveContext.load("jdbc",
> Map("url" -> _ORACLEserver,
> "dbtable" -> "(SELECT TIME_ID AS TIME_ID, CALENDAR_MONTH_DESC FROM
> sh.times)",
> "user" -> _username,
> "password" -> _password))
>
> // Registar three data frames as temporary tables using
> registerTempTable() call
>
> s.registerTempTable("t_s")
> c.registerTempTable("t_c")
> t.registerTempTable("t_t")
> //
> var sqltext : String = ""
> sqltext = """
> SELECT rs.Month, rs.SalesChannel, round(TotalSales,2)
> FROM
> (
> SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC AS SalesChannel,
> SUM(t_s.AMOUNT_SOLD) AS TotalSales
> FROM t_s, t_t, t_c
> WHERE t_s.TIME_ID = t_t.TIME_ID
> AND   t_s.CHANNEL_ID = t_c.CHANNEL_ID
> GROUP BY t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC
> ORDER by t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC
> ) rs
> LIMIT 10
> """
> HiveContext.sql(sqltext).collect.foreach(println)
> println ("\nFinished at"); HiveContext.sql("SELECT
> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
> ").collect.foreach(println)
>
> sys.exit()
>
>
>
> HTH
>
> --
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Cloud Technology
> Partners Ltd, its subsidiaries or their employees, unless expressly so
> stated. It is the responsibility of the recipient to ensure that this email
> is virus free, therefore neither Cloud Technology partners Ltd, its
> subsidiaries nor their employees accept any responsibility.
>
> On 16/02/2016 09:04, Gaurav Agarwal wrote:
>
> Hi
> Can I load the data into spark from oracle storedproc
>
> Thanks
>
>
>
>
>
>
>


Re: Stored proc with spark

2016-02-16 Thread Mich Talebzadeh
You can use JDBC to oracle to get that data from a given table. What Oracle
stored procedure does anyway? How many tables are involved?

JDBC is pretty neat. In example below I use JDBC to load two Dimension
tables from Oracle in Spark shell and read the FACT table of 100 million
rows from Hive

val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

println ("\nStarted at"); HiveContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)

//
var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb"
var _username : String = "sh"
var _password : String = "xx"
//

/Get the FACT table from Hive
//
var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM
oraclehadoop.sales")

//Get Oracle tables via JDBC

val c = HiveContext.load("jdbc",
Map("url" -> _ORACLEserver,
"dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM
sh.channels)",
"user" -> _username,
"password" -> _password))

val t = HiveContext.load("jdbc",
Map("url" -> _ORACLEserver,
"dbtable" -> "(SELECT TIME_ID AS TIME_ID, CALENDAR_MONTH_DESC FROM
sh.times)",
"user" -> _username,
"password" -> _password))

// Registar three data frames as temporary tables using registerTempTable()
call

s.registerTempTable("t_s")
c.registerTempTable("t_c")
t.registerTempTable("t_t")
//
var sqltext : String = ""
sqltext = """
SELECT rs.Month, rs.SalesChannel, round(TotalSales,2)
FROM
(
SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC AS SalesChannel,
SUM(t_s.AMOUNT_SOLD) AS TotalSales
FROM t_s, t_t, t_c
WHERE t_s.TIME_ID = t_t.TIME_ID
AND   t_s.CHANNEL_ID = t_c.CHANNEL_ID
GROUP BY t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC
ORDER by t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC
) rs
LIMIT 10
"""
HiveContext.sql(sqltext).collect.foreach(println)
println ("\nFinished at"); HiveContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)

sys.exit()

 

HTH

--

Dr Mich Talebzadeh

 

LinkedIn
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

 

http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Cloud Technology
Partners Ltd, its subsidiaries or their employees, unless expressly so
stated. It is the responsibility of the recipient to ensure that this email
is virus free, therefore neither Cloud Technology partners Ltd, its
subsidiaries nor their employees accept any responsibility.

On 16/02/2016 09:04, Gaurav Agarwal wrote:

Hi
Can I load the data into spark from oracle storedproc

Thanks

 

 

 



Re: Stored proc with spark

2016-02-16 Thread Jörn Franke

There are many facets to this topic, you could use Sqoop or the spark jdbc 
driver or oracle Hadoop loader or external tables in oracle that use 
coprocessors to stream directly to compressed csv files that are important by 
spark. Depends all on volumes, non-functional and functional requirements.

> On 16 Feb 2016, at 10:04, Gaurav Agarwal  wrote:
> 
> Hi
> Can I load the data into spark from oracle storedproc
> 
> Thanks


Re: Stored proc with spark

2016-02-16 Thread Alonso Isidoro Roman
relational databases? what about sqoop?

https://en.wikipedia.org/wiki/Sqoop



Alonso Isidoro Roman.

Mis citas preferidas (de hoy) :
"Si depurar es el proceso de quitar los errores de software, entonces
programar debe ser el proceso de introducirlos..."
 -  Edsger Dijkstra

My favorite quotes (today):
"If debugging is the process of removing software bugs, then programming
must be the process of putting ..."
  - Edsger Dijkstra

"If you pay peanuts you get monkeys"


2016-02-16 10:04 GMT+01:00 Gaurav Agarwal :

> Hi
> Can I load the data into spark from oracle storedproc
>
> Thanks
>


Re: Stored proc with spark

2016-02-16 Thread Gourav Sengupta
Hi Gaurav,

do you mean stored proc that returns a table?

Regards,
Gourav

On Tue, Feb 16, 2016 at 9:04 AM, Gaurav Agarwal 
wrote:

> Hi
> Can I load the data into spark from oracle storedproc
>
> Thanks
>