Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-26 Thread Benjamin Kim
Chris,

I have a question about your setup. Does it allow the same usage of 
Cassandra/HBase data sources? Can I create a table that links to and be used by 
Spark SQL? The reason for asking is that I see the Cassandra connector package 
included in your script.

Thanks,
Ben

> On Dec 25, 2015, at 6:41 AM, Chris Fregly  wrote:
> 
> Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver needs 
> to be on the Java System Classpath per this 
> 
>  troubleshooting section in the Spark SQL programming guide.
> 
> Here 
> 
>  is an example hive-thrift-server start script from my Spark-based reference 
> pipeline project.  Here 
> 
>  is an example script that decorates the out-of-the-box spark-sql command to 
> use the MySQL JDBC driver.
> 
> These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined 
> here 
> 
>  and here 
> 
>  and includes the path to the local MySQL JDBC driver.  This approach is 
> described here 
> 
>  in the Spark docs that describe the advanced spark-submit options.  
> 
> Any jar specified with --jars will be passed to each worker node in the 
> cluster - specifically in the work directory for each SparkContext for 
> isolation purposes.
> 
> Cleanup of these jars on the worker nodes is handled by YARN automatically, 
> and by Spark Standalone per the spark.worker.cleanup.appDataTtl config param.
> 
> The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, 
> but I couldn't get this to work for whatever reason, so i'm sticking to the 
> --jars approach used in my examples.
> 
> On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim  > wrote:
> Stephen,
> 
> Let me confirm. I just need to propagate these settings I put in 
> spark-defaults.conf to all the worker nodes? Do I need to do the same with 
> the PostgreSQL driver jar file too? If so, is there a way to have it read 
> from HDFS rather than copying out to the cluster manually. 
> 
> Thanks for your help,
> Ben
> 
> 
> On Tuesday, December 22, 2015, Stephen Boesch  > wrote:
> HI Benjamin,  yes by adding to the thrift server then the create table would 
> work.  But querying is performed by the workers: so you need to add to the 
> classpath of all nodes for reads to work.
> 
> 2015-12-22 18:35 GMT-08:00 Benjamin Kim >:
> Hi Stephen,
> 
> I forgot to mention that I added these lines below to the spark-default.conf 
> on the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I 
> restarted it.
> 
> spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> 
> I read in another thread that this would work. I was able to create the table 
> and could see it in my SHOW TABLES list. But, when I try to query the table, 
> I get the same error. It looks like I’m getting close.
> 
> Are there any other things that I have to do that you can think of?
> 
> Thanks,
> Ben
> 
> 
>> On Dec 22, 2015, at 6:25 PM, Stephen Boesch > wrote:
>> 
>> The postgres jdbc driver needs to be added to the  classpath of your spark 
>> workers.  You can do a search for how to do that (multiple ways).
>> 
>> 2015-12-22 17:22 GMT-08:00 b2k70 >:
>> I see in the Spark SQL documentation that a temporary table can be created
>> directly onto a remote PostgreSQL table.
>> 
>> CREATE TEMPORARY TABLE 
>> USING org.apache.spark.sql.jdbc
>> OPTIONS (
>> url "jdbc:postgresql:///",
>> dbtable "impressions"
>> );
>> When I run this against our PostgreSQL server, I get the following error.
>> 
>> Error: java.sql.SQLException: No suitable driver found for
>> jdbc:postgresql:/// (state=,code=0)
>> 
>> Can someone help me understand why this is?
>> 
>> Thanks, Ben
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
>>  
>> 
>> Sent from the Apache Spark User List mailing list archive at Nabble.com 
>> .
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org <>
>> For additional commands, e-mail: user-h...@spark.apache.org <>
>> 
>> 
> 
> 
> 
> 
> 
> -- 
> 
> Chris Fregly
> Principal Data Solutions Engineer
> IBM Spark Technology Center, San Franc

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-25 Thread Benjamin Kim
Hi Chris,

I did what you did. It works for me now! Thanks for your help.

Have a Merry Christmas!

Cheers,
Ben


> On Dec 25, 2015, at 6:41 AM, Chris Fregly  wrote:
> 
> Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver needs 
> to be on the Java System Classpath per this 
> 
>  troubleshooting section in the Spark SQL programming guide.
> 
> Here 
> 
>  is an example hive-thrift-server start script from my Spark-based reference 
> pipeline project.  Here 
> 
>  is an example script that decorates the out-of-the-box spark-sql command to 
> use the MySQL JDBC driver.
> 
> These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined 
> here 
> 
>  and here 
> 
>  and includes the path to the local MySQL JDBC driver.  This approach is 
> described here 
> 
>  in the Spark docs that describe the advanced spark-submit options.  
> 
> Any jar specified with --jars will be passed to each worker node in the 
> cluster - specifically in the work directory for each SparkContext for 
> isolation purposes.
> 
> Cleanup of these jars on the worker nodes is handled by YARN automatically, 
> and by Spark Standalone per the spark.worker.cleanup.appDataTtl config param.
> 
> The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, 
> but I couldn't get this to work for whatever reason, so i'm sticking to the 
> --jars approach used in my examples.
> 
> On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim  > wrote:
> Stephen,
> 
> Let me confirm. I just need to propagate these settings I put in 
> spark-defaults.conf to all the worker nodes? Do I need to do the same with 
> the PostgreSQL driver jar file too? If so, is there a way to have it read 
> from HDFS rather than copying out to the cluster manually. 
> 
> Thanks for your help,
> Ben
> 
> 
> On Tuesday, December 22, 2015, Stephen Boesch  > wrote:
> HI Benjamin,  yes by adding to the thrift server then the create table would 
> work.  But querying is performed by the workers: so you need to add to the 
> classpath of all nodes for reads to work.
> 
> 2015-12-22 18:35 GMT-08:00 Benjamin Kim >:
> Hi Stephen,
> 
> I forgot to mention that I added these lines below to the spark-default.conf 
> on the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I 
> restarted it.
> 
> spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> 
> I read in another thread that this would work. I was able to create the table 
> and could see it in my SHOW TABLES list. But, when I try to query the table, 
> I get the same error. It looks like I’m getting close.
> 
> Are there any other things that I have to do that you can think of?
> 
> Thanks,
> Ben
> 
> 
>> On Dec 22, 2015, at 6:25 PM, Stephen Boesch > wrote:
>> 
>> The postgres jdbc driver needs to be added to the  classpath of your spark 
>> workers.  You can do a search for how to do that (multiple ways).
>> 
>> 2015-12-22 17:22 GMT-08:00 b2k70 >:
>> I see in the Spark SQL documentation that a temporary table can be created
>> directly onto a remote PostgreSQL table.
>> 
>> CREATE TEMPORARY TABLE 
>> USING org.apache.spark.sql.jdbc
>> OPTIONS (
>> url "jdbc:postgresql:///",
>> dbtable "impressions"
>> );
>> When I run this against our PostgreSQL server, I get the following error.
>> 
>> Error: java.sql.SQLException: No suitable driver found for
>> jdbc:postgresql:/// (state=,code=0)
>> 
>> Can someone help me understand why this is?
>> 
>> Thanks, Ben
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
>>  
>> 
>> Sent from the Apache Spark User List mailing list archive at Nabble.com 
>> .
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org <>
>> For additional commands, e-mail: user-h...@spark.apache.org <>
>> 
>> 
> 
> 
> 
> 
> 
> -- 
> 
> Chris Fregly
> Principal Data Solutions Engineer
> IBM Spark Technology Center, San Francisco, CA
> http://spark.tc  | http://advancedspark.com 
> 


Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-25 Thread Chris Fregly
Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver
needs to be on the Java System Classpath per this

troubleshooting
section in the Spark SQL programming guide.

Here

is an example hive-thrift-server start script from my Spark-based reference
pipeline project.  Here

is an example script that decorates the out-of-the-box spark-sql command to
use the MySQL JDBC driver.

These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined
here

and
here

and
includes the path to the local MySQL JDBC driver.  This approach is
described here

in
the Spark docs that describe the advanced spark-submit options.

Any jar specified with --jars will be passed to each worker node in the
cluster - specifically in the work directory for each SparkContext for
isolation purposes.

Cleanup of these jars on the worker nodes is handled by YARN automatically,
and by Spark Standalone per the spark.worker.cleanup.appDataTtl config
param.

The Spark SQL programming guide says to use SPARK_CLASSPATH for this
purpose, but I couldn't get this to work for whatever reason, so i'm
sticking to the --jars approach used in my examples.

On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim  wrote:

> Stephen,
>
> Let me confirm. I just need to propagate these settings I put in
> spark-defaults.conf to all the worker nodes? Do I need to do the same with
> the PostgreSQL driver jar file too? If so, is there a way to have it read
> from HDFS rather than copying out to the cluster manually.
>
> Thanks for your help,
> Ben
>
>
> On Tuesday, December 22, 2015, Stephen Boesch  wrote:
>
>> HI Benjamin,  yes by adding to the thrift server then the create table
>> would work.  But querying is performed by the workers: so you need to add
>> to the classpath of all nodes for reads to work.
>>
>> 2015-12-22 18:35 GMT-08:00 Benjamin Kim :
>>
>>> Hi Stephen,
>>>
>>> I forgot to mention that I added these lines below to the
>>> spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server
>>> running on it. Then, I restarted it.
>>>
>>>
>>> spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
>>>
>>> spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
>>>
>>> I read in another thread that this would work. I was able to create the
>>> table and could see it in my SHOW TABLES list. But, when I try to query the
>>> table, I get the same error. It looks like I’m getting close.
>>>
>>> Are there any other things that I have to do that you can think of?
>>>
>>> Thanks,
>>> Ben
>>>
>>>
>>> On Dec 22, 2015, at 6:25 PM, Stephen Boesch  wrote:
>>>
>>> The postgres jdbc driver needs to be added to the  classpath of your
>>> spark workers.  You can do a search for how to do that (multiple ways).
>>>
>>> 2015-12-22 17:22 GMT-08:00 b2k70 :
>>>
 I see in the Spark SQL documentation that a temporary table can be
 created
 directly onto a remote PostgreSQL table.

 CREATE TEMPORARY TABLE 
 USING org.apache.spark.sql.jdbc
 OPTIONS (
 url "jdbc:postgresql:///",
 dbtable "impressions"
 );
 When I run this against our PostgreSQL server, I get the following
 error.

 Error: java.sql.SQLException: No suitable driver found for
 jdbc:postgresql:///
 (state=,code=0)

 Can someone help me understand why this is?

 Thanks, Ben



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com
 .

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


>>>
>>>
>>


-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com


Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-22 Thread Benjamin Kim
Stephen,

Let me confirm. I just need to propagate these settings I put in
spark-defaults.conf to all the worker nodes? Do I need to do the same with
the PostgreSQL driver jar file too? If so, is there a way to have it read
from HDFS rather than copying out to the cluster manually.

Thanks for your help,
Ben

On Tuesday, December 22, 2015, Stephen Boesch  wrote:

> HI Benjamin,  yes by adding to the thrift server then the create table
> would work.  But querying is performed by the workers: so you need to add
> to the classpath of all nodes for reads to work.
>
> 2015-12-22 18:35 GMT-08:00 Benjamin Kim  >:
>
>> Hi Stephen,
>>
>> I forgot to mention that I added these lines below to the
>> spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server
>> running on it. Then, I restarted it.
>>
>> spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
>>
>> spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
>>
>> I read in another thread that this would work. I was able to create the
>> table and could see it in my SHOW TABLES list. But, when I try to query the
>> table, I get the same error. It looks like I’m getting close.
>>
>> Are there any other things that I have to do that you can think of?
>>
>> Thanks,
>> Ben
>>
>>
>> On Dec 22, 2015, at 6:25 PM, Stephen Boesch > > wrote:
>>
>> The postgres jdbc driver needs to be added to the  classpath of your
>> spark workers.  You can do a search for how to do that (multiple ways).
>>
>> 2015-12-22 17:22 GMT-08:00 b2k70 > >:
>>
>>> I see in the Spark SQL documentation that a temporary table can be
>>> created
>>> directly onto a remote PostgreSQL table.
>>>
>>> CREATE TEMPORARY TABLE 
>>> USING org.apache.spark.sql.jdbc
>>> OPTIONS (
>>> url "jdbc:postgresql:///",
>>> dbtable "impressions"
>>> );
>>> When I run this against our PostgreSQL server, I get the following error.
>>>
>>> Error: java.sql.SQLException: No suitable driver found for
>>> jdbc:postgresql:///
>>> (state=,code=0)
>>>
>>> Can someone help me understand why this is?
>>>
>>> Thanks, Ben
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com
>>> .
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> 
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>> 
>>>
>>>
>>
>>
>


Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-22 Thread Stephen Boesch
HI Benjamin,  yes by adding to the thrift server then the create table
would work.  But querying is performed by the workers: so you need to add
to the classpath of all nodes for reads to work.

2015-12-22 18:35 GMT-08:00 Benjamin Kim :

> Hi Stephen,
>
> I forgot to mention that I added these lines below to the
> spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server
> running on it. Then, I restarted it.
>
> spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
>
> spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
>
> I read in another thread that this would work. I was able to create the
> table and could see it in my SHOW TABLES list. But, when I try to query the
> table, I get the same error. It looks like I’m getting close.
>
> Are there any other things that I have to do that you can think of?
>
> Thanks,
> Ben
>
>
> On Dec 22, 2015, at 6:25 PM, Stephen Boesch  wrote:
>
> The postgres jdbc driver needs to be added to the  classpath of your spark
> workers.  You can do a search for how to do that (multiple ways).
>
> 2015-12-22 17:22 GMT-08:00 b2k70 :
>
>> I see in the Spark SQL documentation that a temporary table can be created
>> directly onto a remote PostgreSQL table.
>>
>> CREATE TEMPORARY TABLE 
>> USING org.apache.spark.sql.jdbc
>> OPTIONS (
>> url "jdbc:postgresql:///",
>> dbtable "impressions"
>> );
>> When I run this against our PostgreSQL server, I get the following error.
>>
>> Error: java.sql.SQLException: No suitable driver found for
>> jdbc:postgresql:/// (state=,code=0)
>>
>> Can someone help me understand why this is?
>>
>> Thanks, Ben
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com
>> .
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>


Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-22 Thread Benjamin Kim
Hi Stephen,

I forgot to mention that I added these lines below to the spark-default.conf on 
the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I 
restarted it.

spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar

I read in another thread that this would work. I was able to create the table 
and could see it in my SHOW TABLES list. But, when I try to query the table, I 
get the same error. It looks like I’m getting close.

Are there any other things that I have to do that you can think of?

Thanks,
Ben


> On Dec 22, 2015, at 6:25 PM, Stephen Boesch  wrote:
> 
> The postgres jdbc driver needs to be added to the  classpath of your spark 
> workers.  You can do a search for how to do that (multiple ways).
> 
> 2015-12-22 17:22 GMT-08:00 b2k70  >:
> I see in the Spark SQL documentation that a temporary table can be created
> directly onto a remote PostgreSQL table.
> 
> CREATE TEMPORARY TABLE 
> USING org.apache.spark.sql.jdbc
> OPTIONS (
> url "jdbc:postgresql:///",
> dbtable "impressions"
> );
> When I run this against our PostgreSQL server, I get the following error.
> 
> Error: java.sql.SQLException: No suitable driver found for
> jdbc:postgresql:/// (state=,code=0)
> 
> Can someone help me understand why this is?
> 
> Thanks, Ben
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
>  
> 
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 
> 



Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-22 Thread Stephen Boesch
The postgres jdbc driver needs to be added to the  classpath of your spark
workers.  You can do a search for how to do that (multiple ways).

2015-12-22 17:22 GMT-08:00 b2k70 :

> I see in the Spark SQL documentation that a temporary table can be created
> directly onto a remote PostgreSQL table.
>
> CREATE TEMPORARY TABLE 
> USING org.apache.spark.sql.jdbc
> OPTIONS (
> url "jdbc:postgresql:///",
> dbtable "impressions"
> );
> When I run this against our PostgreSQL server, I get the following error.
>
> Error: java.sql.SQLException: No suitable driver found for
> jdbc:postgresql:/// (state=,code=0)
>
> Can someone help me understand why this is?
>
> Thanks, Ben
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>