Re: Spark and DB connection pool

Marco Colombo Fri, 25 Mar 2016 06:14:30 -0700

Thanks, I got that I can handle a pool by my own when dealing with
foreachPartition, etc.
My question is mainly related to what happens is such scenario.


.....
val df: DataFrame =
hiveSqlContext.read.format("jdbc").options(options).load();
df.registerTempTable("V_RELATIONS");
.....

I can register df as a temp table for later use in my App.
What happens to underling connections used by when accessing that
dataframe? Are all closed when a 'timeout' is reached or are kept opened
for later user?
What I'm trying to understand is what happens is case I access that table
later, for example, via thrift server of from my code.
If every time DF is accessed and connections have been closed, it is a
performance penalty to reopen then anytime.
Then, especially for Oracle, it is also costly...

Does anyone have a better understanding of this?

Thanks again!

Marco



2016-03-25 13:42 GMT+01:00 manasdebashiskar <poorinsp...@gmail.com>:

> Yes there is.
> You can use the default dbcp or your own preferred connection pool manager.
> Then when you ask for a connection you get one from the pool.
>
> Take a look at this
> https://github.com/manasdebashiskar/kafka-exactly-once
> It is forked from Cody's repo.
>
> ..Manas
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-DB-connection-pool-tp26577p26596.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Ing. Marco Colombo

Re: Spark and DB connection pool

Reply via email to