RE: using a database connection pool to write data into an RDBMS from a Spark application

Mohammed Guller Fri, 20 Feb 2015 09:26:38 -0800

Sean,
I know that Class.forName is not required since Java 1.4 :-) It was just a 
desperate attempt  to make sure that the Postgres driver is getting loaded. 
Since Class.forName("org.postgresql.Driver") is not throwing an exception, I 
assume that the driver is available in the classpath. Is that not true?

I did some more troubleshooting and here is what I found:
1) The hive libraries used by Spark use BoneCP 0.7.1
2) When Spark master is started, it initializes BoneCP, which will not load any 
database driver at that point (that makes sense)
3) When my application initializes BoneCP, it thinks it is already initialized 
and does not load the Postgres driver ( this is a known bug in 0.7.1). This bug 
is fixed in BoneCP 0.8.0 release.

So I linked my app with BoneCP 0.8.0 release, but when I run my app using 
spark-submit, Spark continues to use BoneCP 0.7.1. How do I override that 
behavior? How do I make spark-submit script unload BoneCP 0.7.1 and load BoneCP 
0.8.0? I tried the --jars and --driver-classpath flags, but it didn't help. 

Thanks,
Mohammed

-----Original Message-----
From: Sean Owen [mailto:so...@cloudera.com] 
Sent: Friday, February 20, 2015 2:06 AM
To: Mohammed Guller
Cc: Kelvin Chu; user@spark.apache.org
Subject: Re: using a database connection pool to write data into an RDBMS from 
a Spark application

Although I don't know if it's related, the Class.forName() method of loading 
drivers is very old. You should be using DataSource and javax.sql; this has 
been the usual practice since about Java 1.4.

Why do you say a different driver is being loaded? that's not the error here.

Try instantiating the driver directly to test whether it's available in the 
classpath. Otherwise you would have to check whether the jar exists, the class 
exists in it, and it's really on your classpath.

On Fri, Feb 20, 2015 at 5:27 AM, Mohammed Guller <moham...@glassbeam.com> wrote:
> Hi Kelvin,
>
>
>
> Yes. I am creating an uber jar with the Postgres driver included, but 
> nevertheless tried both –jars and –driver-classpath flags. It didn’t help.
>
>
>
> Interestingly, I can’t use BoneCP even in the driver program when I 
> run my application with spark-submit. I am getting the same exception 
> when the application initializes BoneCP before creating SparkContext. 
> It looks like Spark is loading a different version of the Postgres 
> JDBC driver than the one that I am linking.
>
>
>
> Mohammed
>
>
>
> From: Kelvin Chu [mailto:2dot7kel...@gmail.com]
> Sent: Thursday, February 19, 2015 7:56 PM
> To: Mohammed Guller
> Cc: user@spark.apache.org
> Subject: Re: using a database connection pool to write data into an 
> RDBMS from a Spark application
>
>
>
> Hi Mohammed,
>
>
>
> Did you use --jars to specify your jdbc driver when you submitted your job?
> Take a look of this link:
> http://spark.apache.org/docs/1.2.0/submitting-applications.html
>
>
>
> Hope this help!
>
>
>
> Kelvin
>
>
>
> On Thu, Feb 19, 2015 at 7:24 PM, Mohammed Guller 
> <moham...@glassbeam.com>
> wrote:
>
> Hi –
>
> I am trying to use BoneCP (a database connection pooling library) to 
> write data from my Spark application to an RDBMS. The database inserts 
> are inside a foreachPartition code block. I am getting this exception 
> when the code tries to insert data using BoneCP:
>
>
>
> java.sql.SQLException: No suitable driver found for 
> jdbc:postgresql://hostname:5432/dbname
>
>
>
> I tried explicitly loading the Postgres driver on the worker nodes by 
> adding the following line inside the foreachPartition code block:
>
>
>
> Class.forName("org.postgresql.Driver")
>
>
>
> It didn’t help.
>
>
>
> Has anybody able to get a database connection pool library to work 
> with Spark? If you got it working, can you please share the steps?
>
>
>
> Thanks,
>
> Mohammed
>
>
>
>

RE: using a database connection pool to write data into an RDBMS from a Spark application

Reply via email to