subject:"Spark SQL JDBC"

Re: Spark SQL JDBC teradata syntax error

2019-05-03 Thread Gourav Sengupta

What is the query

On Fri, May 3, 2019 at 5:28 PM KhajaAsmath Mohammed 
wrote:

> Hi
>
> I have followed link
> https://community.teradata.com/t5/Connectivity/Teradata-JDBC-Driver-returns-the-wrong-schema-column-nullability/m-p/77824
>  to
> connect teradata from spark.
>
> I was able to print schema if I give table name instead of sql query.
>
> I am getting below error if I give query(code snippet from above link).
> any help is appreciated?
>
> Exception in thread "main" java.sql.SQLException: [Teradata Database]
> [TeraJDBC 16.20.00.10] [Error 3707] [SQLState 42000] Syntax error, expected
> something like an 'EXCEPT' keyword or an 'UNION' keyword or a 'MINUS'
> keyword between the word 'VEHP91_BOM' and '?'.
> at
> com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDatabaseSQLException(ErrorFactory.java:309)
> at
> com.teradata.jdbc.jdbc_4.statemachine.ReceiveInitSubState.action(ReceiveInitSubState.java:103)
> at
> com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.subStateMachine(StatementReceiveState.java:311)
> at
> com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.action(StatementReceiveState.java:200)
> at
> com.teradata.jdbc.jdbc_4.statemachine.StatementController.runBody(StatementController.java:137)
> at
> com.teradata.jdbc.jdbc_4.statemachine.StatementController.run(StatementController.java:128)
> at
> com.teradata.jdbc.jdbc_4.TDStatement.executeStatement(TDStatement.java:389)
> at
> com.teradata.jdbc.jdbc_4.TDStatement.prepareRequest(TDStatement.java:576)
> at
> com.teradata.jdbc.jdbc_4.TDPreparedStatement.(TDPreparedStatement.java:131)
> at
> com.teradata.jdbc.jdk6.JDK6_SQL_PreparedStatement.(JDK6_SQL_PreparedStatement.java:30)
> at
> com.teradata.jdbc.jdk6.JDK6_SQL_Connection.constructPreparedStatement(JDK6_SQL_Connection.java:82)
> at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1337)
> at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1381)
> at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1367)
>
>
> Thanks,
> Asmath
>

Spark SQL JDBC teradata syntax error

2019-05-03 Thread KhajaAsmath Mohammed

Hi

I have followed link
https://community.teradata.com/t5/Connectivity/Teradata-JDBC-Driver-returns-the-wrong-schema-column-nullability/m-p/77824
to
connect teradata from spark.

I was able to print schema if I give table name instead of sql query.

I am getting below error if I give query(code snippet from above link). any
help is appreciated?

Exception in thread "main" java.sql.SQLException: [Teradata Database]
[TeraJDBC 16.20.00.10] [Error 3707] [SQLState 42000] Syntax error, expected
something like an 'EXCEPT' keyword or an 'UNION' keyword or a 'MINUS'
keyword between the word 'VEHP91_BOM' and '?'.
at
com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDatabaseSQLException(ErrorFactory.java:309)
at
com.teradata.jdbc.jdbc_4.statemachine.ReceiveInitSubState.action(ReceiveInitSubState.java:103)
at
com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.subStateMachine(StatementReceiveState.java:311)
at
com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.action(StatementReceiveState.java:200)
at
com.teradata.jdbc.jdbc_4.statemachine.StatementController.runBody(StatementController.java:137)
at
com.teradata.jdbc.jdbc_4.statemachine.StatementController.run(StatementController.java:128)
at
com.teradata.jdbc.jdbc_4.TDStatement.executeStatement(TDStatement.java:389)
at com.teradata.jdbc.jdbc_4.TDStatement.prepareRequest(TDStatement.java:576)
at
com.teradata.jdbc.jdbc_4.TDPreparedStatement.(TDPreparedStatement.java:131)
at
com.teradata.jdbc.jdk6.JDK6_SQL_PreparedStatement.(JDK6_SQL_PreparedStatement.java:30)
at
com.teradata.jdbc.jdk6.JDK6_SQL_Connection.constructPreparedStatement(JDK6_SQL_Connection.java:82)
at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1337)
at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1381)
at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1367)


Thanks,
Asmath

[Spark SQL] JDBC connection from UDF

2017-07-03 Thread Patrik Medvedev

Hello guys,

I'm using Spark SQL with Hive thru Thrift.
I need this because I need to create a table by table mask.
Here is an example:
1. Take tables by mask, like SHOW TABLES IN db 'table__*'
2. Create query like:
CREATE TABLE total_data AS
SELECT * FROM table__1
UNION ALL
SELECT * FROM table__2
UNION ALL
SELECT * FROM table__3

Due to this, i need to create JDBC connection inside UDF, problem is, that
i need to create connection dynamically, that means, that i need to take
host name, port and user name from hive properties, this's easy from hive,
i'm using properties:
host - hive.server2.thrift.bind.host
port - hive.server2.thrift.port
user - user always the same as ran UDF

but, problem is that hive.server2.thrift.bind.host parameter not defined in
Yarn, and user that ran UDF is hive.
Maybe you have solution, how i can get host name, and more important thing
- how i can run UDF from user that ran SQL(not user hive).

Spark SQL -JDBC connectivity

2016-08-09 Thread Soni spark

Hi,

I would to know the steps to connect SPARK SQL from spring framework
(Web-UI).
also how to run and deploy the web application?

Re: spark-sql jdbc dataframe mysql data type issue

2016-06-25 Thread Mich Talebzadeh

select 10 sample rows for columns id, ctime from each (MySQL and spark)
tables and post the output please.

HTH

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 25 June 2016 at 13:36, 刘虓  wrote:

> Hi,
> I came across this strange behavior of Apache Spark 1.6.1:
> when I was reading mysql table into spark dataframe ,a column of data type
> float got mapped into double.
>
> dataframe schema:
>
> root
>
>  |-- id: long (nullable = true)
>
>  |-- ctime: double (nullable = true)
>
>  |-- atime: double (nullable = true)
>
> mysql schema:
>
> mysql> desc test.user_action_2;
>
> +---+--+--+-+-+---+
>
> | Field | Type | Null | Key | Default | Extra |
>
> +---+--+--+-+-+---+
>
> | id| int(10) unsigned | YES  | | NULL|   |
>
> | ctime | float| YES  | | NULL|   |
>
> | atime | double   | YES  | | NULL|   |
>
> +---+--+--+-+-+---+
> I wonder if anyone have seen this behavior before.
>

spark-sql jdbc dataframe mysql data type issue

2016-06-25 Thread 刘虓

Hi,
I came across this strange behavior of Apache Spark 1.6.1:
when I was reading mysql table into spark dataframe ,a column of data type
float got mapped into double.

dataframe schema:

root

 |-- id: long (nullable = true)

 |-- ctime: double (nullable = true)

 |-- atime: double (nullable = true)

mysql schema:

mysql> desc test.user_action_2;

+---+--+--+-+-+---+

| Field | Type | Null | Key | Default | Extra |

+---+--+--+-+-+---+

| id| int(10) unsigned | YES  | | NULL|   |

| ctime | float| YES  | | NULL|   |

| atime | double   | YES  | | NULL|   |

+---+--+--+-+-+---+
I wonder if anyone have seen this behavior before.

Re: Having issue with Spark SQL JDBC on hive table !!!

2016-01-28 Thread @Sanjiv Singh

It working now ...

I checked at Spark worker UI , executor startup failing with below error ,
JVM initialization failing because of wrong -Xms :

Invalid initial heap size: -Xms0MError: Could not create the Java
Virtual Machine.Error: A fatal exception has occurred. Program will
exit.

Thrift server is not picking executor memory from *spark-env.sh* , then I
added in thrift server startup script explicitly.

*./sbin/start-thriftserver.sh*

exec "$FWDIR"/sbin/spark-daemon.sh spark-submit $CLASS 1
--executor-memory 512M "$@"

With this , Executor start getting valid memory and JDBC queries are
getting results.

*conf/spark-env.sh* (executor memory configurations not picked by
thrift-server)

export SPARK_JAVA_OPTS="-Dspark.executor.memory=512M"
export SPARK_EXECUTOR_MEMORY=512M


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Thu, Jan 28, 2016 at 10:57 PM, @Sanjiv Singh 
wrote:

> Adding to it
>
> job status at UI :
>
> Stage IdDescriptionSubmittedDurationTasks: Succeeded/TotalInputOutputShuffle
> ReadShuffle Write
> 1 select ename from employeetest(kill
> <http://impetus-d951centos:4040/stages/stage/kill?id=1&terminate=true>)collect
> at SparkPlan.scala:84
> <http://impetus-d951centos:4040/stages/stage?id=1&attempt=0>+details
>
> 2016/01/29 04:20:06 3.0 min
> 0/2
>
> Getting below exception on Spark UI :
>
> org.apache.spark.rdd.RDD.collect(RDD.scala:813)
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
> org.apache.spark.sql.DataFrame.collect(DataFrame.scala:887)
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178)
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:744)
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Thu, Jan 28, 2016 at 9:57 PM, @Sanjiv Singh 
> wrote:
>
>> Any help on this.
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>> On Wed, Jan 27, 2016 at 10:25 PM, @Sanjiv Singh 
>> wrote:
>>
>>> Hi Ted ,
>>> Its typo.
>>>
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob :  +091 9990-447-339
>>>
>>> On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu  wrote:
>>>
>>>> In the last snippet, temptable is shown by 'show tables' command.
>>>> Yet you queried tampTable.
>>>>
>>>> I believe this just was typo :-)
>>>>
>>>> On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh 
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I have configured Spark to query on hive table.
>>>>>
>>>>> Run the Thrift JDBC/ODBC server using below command :
>>>>>
>>>>> *cd $SPARK_HOME*
>>>>> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
>>>>> hive.server2.thrift.bind.host=myhost --hiveconf
>>>>> hive.server2.thrift.port=*
>>>>>
>>>>> and also able to connect through beeline
>>>>>
>>>>> *beeline>* !connect jdbc:hive2://192.168.145.20:
>>>>> Enter username for jdbc:hive2://192.168.145.20:: root
>>>>> Enter password for jdbc:hive2://192.168.145.20:: impetus
>>>>> *beeline > *
>>>>>
>>>>> It is not giving query result on hive table through Spark JDBC, but it
>>>>> is working with spark HiveSQLContext. See complete scenario explain below.
>>>>>
>>>>> Help me understand the issue why Spark SQL JDBC is n

Re: Having issue with Spark SQL JDBC on hive table !!!

2016-01-28 Thread @Sanjiv Singh

Adding to it

job status at UI :

Stage IdDescriptionSubmittedDurationTasks: Succeeded/TotalInputOutputShuffle
ReadShuffle Write
1 select ename from employeetest(kill
<http://impetus-d951centos:4040/stages/stage/kill?id=1&terminate=true>)collect
at SparkPlan.scala:84
<http://impetus-d951centos:4040/stages/stage?id=1&attempt=0>+details

2016/01/29 04:20:06 3.0 min
0/2

Getting below exception on Spark UI :

org.apache.spark.rdd.RDD.collect(RDD.scala:813)
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
org.apache.spark.sql.DataFrame.collect(DataFrame.scala:887)
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178)
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Thu, Jan 28, 2016 at 9:57 PM, @Sanjiv Singh 
wrote:

> Any help on this.
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Wed, Jan 27, 2016 at 10:25 PM, @Sanjiv Singh 
> wrote:
>
>> Hi Ted ,
>> Its typo.
>>
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>> On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu  wrote:
>>
>>> In the last snippet, temptable is shown by 'show tables' command.
>>> Yet you queried tampTable.
>>>
>>> I believe this just was typo :-)
>>>
>>> On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh 
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I have configured Spark to query on hive table.
>>>>
>>>> Run the Thrift JDBC/ODBC server using below command :
>>>>
>>>> *cd $SPARK_HOME*
>>>> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
>>>> hive.server2.thrift.bind.host=myhost --hiveconf
>>>> hive.server2.thrift.port=*
>>>>
>>>> and also able to connect through beeline
>>>>
>>>> *beeline>* !connect jdbc:hive2://192.168.145.20:9999
>>>> Enter username for jdbc:hive2://192.168.145.20:: root
>>>> Enter password for jdbc:hive2://192.168.145.20:: impetus
>>>> *beeline > *
>>>>
>>>> It is not giving query result on hive table through Spark JDBC, but it
>>>> is working with spark HiveSQLContext. See complete scenario explain below.
>>>>
>>>> Help me understand the issue why Spark SQL JDBC is not giving result ?
>>>>
>>>> Below are version details.
>>>>
>>>> *Hive Version  : 1.2.1*
>>>> *Hadoop Version :  2.6.0*
>>>> *Spark version:  1.3.1*
>>>>
>>>> Let me know if need other details.
>>>>
>>>>
>>>> *Created Hive Table , insert some records and query it :*
>>>>
>>>> *beeline> !connect jdbc:hive2://myhost:1*
>>>> Enter username for jdbc:hive2://myhost:1: root
>>>> Enter password for jdbc:hive2://myhost:1: **
>>>> *beeline> create table tampTable(id int ,name string ) clustered by
>>>> (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
>>>> *beeline> insert into table tampTable values
>>>> (1,'row1'),(2,'row2'),(3,'row3');*
>>>> *beeline> select name from tampTable;*
>>>> name
>>>> -
>>>> row1
>>>> row3
>>>> row2
>>>>
>>>> *Query through SparkSQL HiveSQLContext :*
>>>>
>>>> SparkConf sparkConf = new SparkConf().setAppName("JavaSpar

Re: Having issue with Spark SQL JDBC on hive table !!!

2016-01-28 Thread @Sanjiv Singh

Any help on this.

Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Wed, Jan 27, 2016 at 10:25 PM, @Sanjiv Singh 
wrote:

> Hi Ted ,
> Its typo.
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu  wrote:
>
>> In the last snippet, temptable is shown by 'show tables' command.
>> Yet you queried tampTable.
>>
>> I believe this just was typo :-)
>>
>> On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh 
>> wrote:
>>
>>> Hi All,
>>>
>>> I have configured Spark to query on hive table.
>>>
>>> Run the Thrift JDBC/ODBC server using below command :
>>>
>>> *cd $SPARK_HOME*
>>> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
>>> hive.server2.thrift.bind.host=myhost --hiveconf
>>> hive.server2.thrift.port=*
>>>
>>> and also able to connect through beeline
>>>
>>> *beeline>* !connect jdbc:hive2://192.168.145.20:
>>> Enter username for jdbc:hive2://192.168.145.20:: root
>>> Enter password for jdbc:hive2://192.168.145.20:9999: impetus
>>> *beeline > *
>>>
>>> It is not giving query result on hive table through Spark JDBC, but it
>>> is working with spark HiveSQLContext. See complete scenario explain below.
>>>
>>> Help me understand the issue why Spark SQL JDBC is not giving result ?
>>>
>>> Below are version details.
>>>
>>> *Hive Version  : 1.2.1*
>>> *Hadoop Version :  2.6.0*
>>> *Spark version:  1.3.1*
>>>
>>> Let me know if need other details.
>>>
>>>
>>> *Created Hive Table , insert some records and query it :*
>>>
>>> *beeline> !connect jdbc:hive2://myhost:1*
>>> Enter username for jdbc:hive2://myhost:1: root
>>> Enter password for jdbc:hive2://myhost:1: **
>>> *beeline> create table tampTable(id int ,name string ) clustered by (id)
>>> into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
>>> *beeline> insert into table tampTable values
>>> (1,'row1'),(2,'row2'),(3,'row3');*
>>> *beeline> select name from tampTable;*
>>> name
>>> -
>>> row1
>>> row3
>>> row2
>>>
>>> *Query through SparkSQL HiveSQLContext :*
>>>
>>> SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
>>> SparkContext sc = new SparkContext(sparkConf);
>>> HiveContext hiveContext = new HiveContext(sc);
>>> DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
>>> List teenagerNames = teenagers.toJavaRDD().map(new Function>> String>() {
>>>  @Override
>>>  public String call(Row row) {
>>>  return "Name: " + row.getString(0);
>>>  }
>>> }).collect();
>>> for (String name: teenagerNames) {
>>>  System.out.println(name);
>>> }
>>> teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
>>> sc.stop();
>>>
>>> which is working perfectly and giving all names from table *tempTable*
>>>
>>> *Query through Spark SQL JDBC :*
>>>
>>> *beeline> !connect jdbc:hive2://myhost:*
>>> Enter username for jdbc:hive2://myhost:: root
>>> Enter password for jdbc:hive2://myhost:: **
>>> *beeline> show tables;*
>>> *temptable*
>>> *..other tables*
>>> beeline> *SELECT name FROM tampTable;*
>>>
>>> I can list the table through "show tables", but I run the query , it is
>>> either hanged or returns nothing.
>>>
>>>
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob :  +091 9990-447-339
>>>
>>
>>
>

Re: Having issue with Spark SQL JDBC on hive table !!!

2016-01-27 Thread @Sanjiv Singh

Hi Ted ,
Its typo.


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu  wrote:

> In the last snippet, temptable is shown by 'show tables' command.
> Yet you queried tampTable.
>
> I believe this just was typo :-)
>
> On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh 
> wrote:
>
>> Hi All,
>>
>> I have configured Spark to query on hive table.
>>
>> Run the Thrift JDBC/ODBC server using below command :
>>
>> *cd $SPARK_HOME*
>> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
>> hive.server2.thrift.bind.host=myhost --hiveconf
>> hive.server2.thrift.port=*
>>
>> and also able to connect through beeline
>>
>> *beeline>* !connect jdbc:hive2://192.168.145.20:
>> Enter username for jdbc:hive2://192.168.145.20:: root
>> Enter password for jdbc:hive2://192.168.145.20:: impetus
>> *beeline > *
>>
>> It is not giving query result on hive table through Spark JDBC, but it is
>> working with spark HiveSQLContext. See complete scenario explain below.
>>
>> Help me understand the issue why Spark SQL JDBC is not giving result ?
>>
>> Below are version details.
>>
>> *Hive Version  : 1.2.1*
>> *Hadoop Version :  2.6.0*
>> *Spark version:  1.3.1*
>>
>> Let me know if need other details.
>>
>>
>> *Created Hive Table , insert some records and query it :*
>>
>> *beeline> !connect jdbc:hive2://myhost:1*
>> Enter username for jdbc:hive2://myhost:1: root
>> Enter password for jdbc:hive2://myhost:1: **
>> *beeline> create table tampTable(id int ,name string ) clustered by (id)
>> into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
>> *beeline> insert into table tampTable values
>> (1,'row1'),(2,'row2'),(3,'row3');*
>> *beeline> select name from tampTable;*
>> name
>> -
>> row1
>> row3
>> row2
>>
>> *Query through SparkSQL HiveSQLContext :*
>>
>> SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
>> SparkContext sc = new SparkContext(sparkConf);
>> HiveContext hiveContext = new HiveContext(sc);
>> DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
>> List teenagerNames = teenagers.toJavaRDD().map(new Function> String>() {
>>  @Override
>>  public String call(Row row) {
>>  return "Name: " + row.getString(0);
>>  }
>> }).collect();
>> for (String name: teenagerNames) {
>>  System.out.println(name);
>> }
>> teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
>> sc.stop();
>>
>> which is working perfectly and giving all names from table *tempTable*
>>
>> *Query through Spark SQL JDBC :*
>>
>> *beeline> !connect jdbc:hive2://myhost:*
>> Enter username for jdbc:hive2://myhost:: root
>> Enter password for jdbc:hive2://myhost:: **
>> *beeline> show tables;*
>> *temptable*
>> *..other tables*
>> beeline> *SELECT name FROM tampTable;*
>>
>> I can list the table through "show tables", but I run the query , it is
>> either hanged or returns nothing.
>>
>>
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>
>

Re: Having issue with Spark SQL JDBC on hive table !!!

2016-01-27 Thread Ted Yu

In the last snippet, temptable is shown by 'show tables' command.
Yet you queried tampTable.

I believe this just was typo :-)

On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh 
wrote:

> Hi All,
>
> I have configured Spark to query on hive table.
>
> Run the Thrift JDBC/ODBC server using below command :
>
> *cd $SPARK_HOME*
> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
> hive.server2.thrift.bind.host=myhost --hiveconf
> hive.server2.thrift.port=*
>
> and also able to connect through beeline
>
> *beeline>* !connect jdbc:hive2://192.168.145.20:
> Enter username for jdbc:hive2://192.168.145.20:: root
> Enter password for jdbc:hive2://192.168.145.20:: impetus
> *beeline > *
>
> It is not giving query result on hive table through Spark JDBC, but it is
> working with spark HiveSQLContext. See complete scenario explain below.
>
> Help me understand the issue why Spark SQL JDBC is not giving result ?
>
> Below are version details.
>
> *Hive Version  : 1.2.1*
> *Hadoop Version :  2.6.0*
> *Spark version:  1.3.1*
>
> Let me know if need other details.
>
>
> *Created Hive Table , insert some records and query it :*
>
> *beeline> !connect jdbc:hive2://myhost:1*
> Enter username for jdbc:hive2://myhost:1: root
> Enter password for jdbc:hive2://myhost:1: **
> *beeline> create table tampTable(id int ,name string ) clustered by (id)
> into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
> *beeline> insert into table tampTable values
> (1,'row1'),(2,'row2'),(3,'row3');*
> *beeline> select name from tampTable;*
> name
> -
> row1
> row3
> row2
>
> *Query through SparkSQL HiveSQLContext :*
>
> SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
> SparkContext sc = new SparkContext(sparkConf);
> HiveContext hiveContext = new HiveContext(sc);
> DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
> List teenagerNames = teenagers.toJavaRDD().map(new Function String>() {
>  @Override
>  public String call(Row row) {
>  return "Name: " + row.getString(0);
>  }
> }).collect();
> for (String name: teenagerNames) {
>  System.out.println(name);
> }
> teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
> sc.stop();
>
> which is working perfectly and giving all names from table *tempTable*
>
> *Query through Spark SQL JDBC :*
>
> *beeline> !connect jdbc:hive2://myhost:*
> Enter username for jdbc:hive2://myhost:: root
> Enter password for jdbc:hive2://myhost:: **
> *beeline> show tables;*
> *temptable*
> *..other tables*
> beeline> *SELECT name FROM tampTable;*
>
> I can list the table through "show tables", but I run the query , it is
> either hanged or returns nothing.
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>

Having issue with Spark SQL JDBC on hive table !!!

2016-01-27 Thread @Sanjiv Singh

Hi All,

I have configured Spark to query on hive table.

Run the Thrift JDBC/ODBC server using below command :

*cd $SPARK_HOME*
*./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
hive.server2.thrift.bind.host=myhost --hiveconf
hive.server2.thrift.port=*

and also able to connect through beeline

*beeline>* !connect jdbc:hive2://192.168.145.20:
Enter username for jdbc:hive2://192.168.145.20:: root
Enter password for jdbc:hive2://192.168.145.20:: impetus
*beeline > *

It is not giving query result on hive table through Spark JDBC, but it is
working with spark HiveSQLContext. See complete scenario explain below.

Help me understand the issue why Spark SQL JDBC is not giving result ?

Below are version details.

*Hive Version  : 1.2.1*
*Hadoop Version :  2.6.0*
*Spark version:  1.3.1*

Let me know if need other details.


*Created Hive Table , insert some records and query it :*

*beeline> !connect jdbc:hive2://myhost:1*
Enter username for jdbc:hive2://myhost:1: root
Enter password for jdbc:hive2://myhost:1: **
*beeline> create table tampTable(id int ,name string ) clustered by (id)
into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
*beeline> insert into table tampTable values
(1,'row1'),(2,'row2'),(3,'row3');*
*beeline> select name from tampTable;*
name
-
row1
row3
row2

*Query through SparkSQL HiveSQLContext :*

SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
SparkContext sc = new SparkContext(sparkConf);
HiveContext hiveContext = new HiveContext(sc);
DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
List teenagerNames = teenagers.toJavaRDD().map(new Function() {
 @Override
 public String call(Row row) {
 return "Name: " + row.getString(0);
 }
}).collect();
for (String name: teenagerNames) {
 System.out.println(name);
}
teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
sc.stop();

which is working perfectly and giving all names from table *tempTable*

*Query through Spark SQL JDBC :*

*beeline> !connect jdbc:hive2://myhost:*
Enter username for jdbc:hive2://myhost:: root
Enter password for jdbc:hive2://myhost:: **
*beeline> show tables;*
*temptable*
*..other tables*
beeline> *SELECT name FROM tampTable;*

I can list the table through "show tables", but I run the query , it is
either hanged or returns nothing.



Regards
Sanjiv Singh
Mob :  +091 9990-447-339

Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-06 Thread Madabhattula Rajesh Kumar

Thank you Richard

Regards,
Rajesh

On Fri, Nov 6, 2015 at 10:10 PM, Richard Hillegas 
wrote:

> Hi Rajesh,
>
> The 1.6 schedule is available on the front page of the Spark wiki:
> https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage. I don't
> know of any workarounds for this problem.
>
> Thanks,
> Rick
>
>
> Madabhattula Rajesh Kumar  wrote on 11/05/2015
> 06:35:22 PM:
>
> > From: Madabhattula Rajesh Kumar 
> > To: Richard Hillegas/San Francisco/IBM@IBMUS
> > Cc: "u...@spark.incubator.apache.org"
> > , "user@spark.apache.org"
> > 
> > Date: 11/05/2015 06:35 PM
> > Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
> >
> > Hi Richard,
>
> > Thank you for the updates. Do you know tentative timeline for 1.6
> > release? Mean while, any workaround solution for this issue?
>
> > Regards,
> > Rajesh
> >
>
> >
> > On Thu, Nov 5, 2015 at 10:57 PM, Richard Hillegas 
> wrote:
> > Or you may be referring to
> https://issues.apache.org/jira/browse/SPARK-10648
> > . That issue has a couple pull requests but I think that the limited
> > bandwidth of the committers still applies.
> >
> > Thanks,
> > Rick
> >
> >
> > Richard Hillegas/San Francisco/IBM@IBMUS wrote on 11/05/2015 09:16:42
> AM:
> >
> > > From: Richard Hillegas/San Francisco/IBM@IBMUS
> > > To: Madabhattula Rajesh Kumar 
> > > Cc: "user@spark.apache.org" ,
> > > "u...@spark.incubator.apache.org" 
> > > Date: 11/05/2015 09:17 AM
> > > Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
> >
> > >
> > > Hi Rajesh,
> > >
> > > I think that you may be referring to https://issues.apache.org/jira/
> > > browse/SPARK-10909. A pull request on that issue was submitted more
> > > than a month ago but it has not been committed. I think that the
> > > committers are busy working on issues which were targeted for 1.6
> > > and I doubt that they will have the spare cycles to vet that pull
> request.
> > >
> > > Thanks,
> > > Rick
> > >
> > >
> > > Madabhattula Rajesh Kumar  wrote on 11/05/2015
> > > 05:51:29 AM:
> > >
> > > > From: Madabhattula Rajesh Kumar 
> > > > To: "user@spark.apache.org" ,
> > > > "u...@spark.incubator.apache.org" 
> > > > Date: 11/05/2015 05:51 AM
> > > > Subject: Spark sql jdbc fails for Oracle NUMBER type columns
> > > >
> > > > Hi,
> > >
> > > > Is this issue fixed in 1.5.1 version?
> > >
> > > > Regards,
> > > > Rajesh
>
>

Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-06 Thread Richard Hillegas


Hi Rajesh,

The 1.6 schedule is available on the front page of the Spark wiki:
https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage. I don't
know of any workarounds for this problem.

Thanks,
Rick


Madabhattula Rajesh Kumar  wrote on 11/05/2015
06:35:22 PM:

> From: Madabhattula Rajesh Kumar 
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: "u...@spark.incubator.apache.org"
> , "user@spark.apache.org"
> 
> Date: 11/05/2015 06:35 PM
> Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
>
> Hi Richard,

> Thank you for the updates. Do you know tentative timeline for 1.6
> release? Mean while, any workaround solution for this issue?

> Regards,
> Rajesh
>

>
> On Thu, Nov 5, 2015 at 10:57 PM, Richard Hillegas 
wrote:
> Or you may be referring to
https://issues.apache.org/jira/browse/SPARK-10648
> . That issue has a couple pull requests but I think that the limited
> bandwidth of the committers still applies.
>
> Thanks,
> Rick
>
>
> Richard Hillegas/San Francisco/IBM@IBMUS wrote on 11/05/2015 09:16:42 AM:
>
> > From: Richard Hillegas/San Francisco/IBM@IBMUS
> > To: Madabhattula Rajesh Kumar 
> > Cc: "user@spark.apache.org" ,
> > "u...@spark.incubator.apache.org" 
> > Date: 11/05/2015 09:17 AM
> > Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
>
> >
> > Hi Rajesh,
> >
> > I think that you may be referring to https://issues.apache.org/jira/
> > browse/SPARK-10909. A pull request on that issue was submitted more
> > than a month ago but it has not been committed. I think that the
> > committers are busy working on issues which were targeted for 1.6
> > and I doubt that they will have the spare cycles to vet that pull
request.
> >
> > Thanks,
> > Rick
> >
> >
> > Madabhattula Rajesh Kumar  wrote on 11/05/2015
> > 05:51:29 AM:
> >
> > > From: Madabhattula Rajesh Kumar 
> > > To: "user@spark.apache.org" ,
> > > "u...@spark.incubator.apache.org" 
> > > Date: 11/05/2015 05:51 AM
> > > Subject: Spark sql jdbc fails for Oracle NUMBER type columns
> > >
> > > Hi,
> >
> > > Is this issue fixed in 1.5.1 version?
> >
> > > Regards,
> > > Rajesh

Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-05 Thread Madabhattula Rajesh Kumar

Hi Richard,

Thank you for the updates. Do you know tentative timeline for 1.6 release?
Mean while, any workaround solution for this issue?

Regards,
Rajesh



On Thu, Nov 5, 2015 at 10:57 PM, Richard Hillegas 
wrote:

> Or you may be referring to
> https://issues.apache.org/jira/browse/SPARK-10648. That issue has a
> couple pull requests but I think that the limited bandwidth of the
> committers still applies.
>
> Thanks,
> Rick
>
>
> Richard Hillegas/San Francisco/IBM@IBMUS wrote on 11/05/2015 09:16:42 AM:
>
> > From: Richard Hillegas/San Francisco/IBM@IBMUS
> > To: Madabhattula Rajesh Kumar 
> > Cc: "user@spark.apache.org" ,
> > "u...@spark.incubator.apache.org" 
> > Date: 11/05/2015 09:17 AM
> > Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
>
> >
> > Hi Rajesh,
> >
> > I think that you may be referring to https://issues.apache.org/jira/
> > browse/SPARK-10909. A pull request on that issue was submitted more
> > than a month ago but it has not been committed. I think that the
> > committers are busy working on issues which were targeted for 1.6
> > and I doubt that they will have the spare cycles to vet that pull
> request.
> >
> > Thanks,
> > Rick
> >
> >
> > Madabhattula Rajesh Kumar  wrote on 11/05/2015
> > 05:51:29 AM:
> >
> > > From: Madabhattula Rajesh Kumar 
> > > To: "user@spark.apache.org" ,
> > > "u...@spark.incubator.apache.org" 
> > > Date: 11/05/2015 05:51 AM
> > > Subject: Spark sql jdbc fails for Oracle NUMBER type columns
> > >
> > > Hi,
> >
> > > Is this issue fixed in 1.5.1 version?
> >
> > > Regards,
> > > Rajesh
>
>

Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-05 Thread Richard Hillegas


Or you may be referring to
https://issues.apache.org/jira/browse/SPARK-10648. That issue has a couple
pull requests but I think that the limited bandwidth of the committers
still applies.

Thanks,
Rick


Richard Hillegas/San Francisco/IBM@IBMUS wrote on 11/05/2015 09:16:42 AM:

> From: Richard Hillegas/San Francisco/IBM@IBMUS
> To: Madabhattula Rajesh Kumar 
> Cc: "user@spark.apache.org" ,
> "u...@spark.incubator.apache.org" 
> Date: 11/05/2015 09:17 AM
> Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
>
> Hi Rajesh,
>
> I think that you may be referring to https://issues.apache.org/jira/
> browse/SPARK-10909. A pull request on that issue was submitted more
> than a month ago but it has not been committed. I think that the
> committers are busy working on issues which were targeted for 1.6
> and I doubt that they will have the spare cycles to vet that pull
request.
>
> Thanks,
> Rick
>
>
> Madabhattula Rajesh Kumar  wrote on 11/05/2015
> 05:51:29 AM:
>
> > From: Madabhattula Rajesh Kumar 
> > To: "user@spark.apache.org" ,
> > "u...@spark.incubator.apache.org" 
> > Date: 11/05/2015 05:51 AM
> > Subject: Spark sql jdbc fails for Oracle NUMBER type columns
> >
> > Hi,
>
> > Is this issue fixed in 1.5.1 version?
>
> > Regards,
> > Rajesh

Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-05 Thread Richard Hillegas


Hi Rajesh,

I think that you may be referring to
https://issues.apache.org/jira/browse/SPARK-10909. A pull request on that
issue was submitted more than a month ago but it has not been committed. I
think that the committers are busy working on issues which were targeted
for 1.6 and I doubt that they will have the spare cycles to vet that pull
request.

Thanks,
Rick


Madabhattula Rajesh Kumar  wrote on 11/05/2015
05:51:29 AM:

> From: Madabhattula Rajesh Kumar 
> To: "user@spark.apache.org" ,
> "u...@spark.incubator.apache.org" 
> Date: 11/05/2015 05:51 AM
> Subject: Spark sql jdbc fails for Oracle NUMBER type columns
>
> Hi,

> Is this issue fixed in 1.5.1 version?

> Regards,
> Rajesh

Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-05 Thread Madabhattula Rajesh Kumar

Hi,

Is this issue fixed in 1.5.1 version?

Regards,
Rajesh

Re: databases currently supported by Spark SQL JDBC

2015-07-09 Thread ayan guha

I suppose every RDBMS has a jdbc driver to connct to. I know Oracle, MySQL,
SQL Server, Terdata, Netezza have.

On Thu, Jul 9, 2015 at 10:09 PM, Niranda Perera 
wrote:

> Hi,
>
> I'm planning to use Spark SQL JDBC datasource provider in various RDBMS
> databases.
>
> what are the databases currently supported by Spark JDBC relation provider?
>
> rgds
>
> --
> Niranda
> @n1r44 <https://twitter.com/N1R44>
> https://pythagoreanscript.wordpress.com/
>

-- 
Best Regards,
Ayan Guha

databases currently supported by Spark SQL JDBC

2015-07-09 Thread Niranda Perera

Hi,

I'm planning to use Spark SQL JDBC datasource provider in various RDBMS
databases.

what are the databases currently supported by Spark JDBC relation provider?

rgds

-- 
Niranda
@n1r44 <https://twitter.com/N1R44>
https://pythagoreanscript.wordpress.com/

Re: Spark SQL JDBC Source data skew

2015-06-25 Thread Sathish Kumaran Vairavelu

Can some one help me here? Please
On Sat, Jun 20, 2015 at 9:54 AM Sathish Kumaran Vairavelu <
vsathishkuma...@gmail.com> wrote:

> Hi,
>
> In Spark SQL JDBC data source there is an option to specify upper/lower
> bound and num of partitions. How Spark handles data distribution, if we do
> not give the upper/lower/num of parititons ? Will all data from the
> external data source skewed up in one executor?
>
> In many situations, we do not know the upper/lower bound of the underlying
> dataset until the query is executed, so it is not possible to pass
> upper/lower bound values.
>
>
> Thanks
>
> Sathish
>

Spark SQL JDBC Source data skew

2015-06-20 Thread Sathish Kumaran Vairavelu

Hi,

In Spark SQL JDBC data source there is an option to specify upper/lower
bound and num of partitions. How Spark handles data distribution, if we do
not give the upper/lower/num of parititons ? Will all data from the
external data source skewed up in one executor?

In many situations, we do not know the upper/lower bound of the underlying
dataset until the query is executed, so it is not possible to pass
upper/lower bound values.


Thanks

Sathish

Re: Spark SQL JDBC Source Join Error

2015-06-14 Thread Sathish Kumaran Vairavelu

Thank you.. it works in Spark 1.4.

On Sun, Jun 14, 2015 at 3:51 PM Michael Armbrust 
wrote:

> Sounds like SPARK-5456 .
> Which is fixed in Spark 1.4.
>
> On Sun, Jun 14, 2015 at 11:57 AM, Sathish Kumaran Vairavelu <
> vsathishkuma...@gmail.com> wrote:
>
>> Hello Everyone,
>>
>> I pulled 2 different tables from the JDBC source and then joined them
>> using the cust_id *decimal* column. A simple join like as below. This
>> simple join works perfectly in the database but not in Spark SQL. I am
>> importing 2 tables as a data frame/registertemptable and firing sql on top
>> of it. Please let me know what could be the error..
>>
>> select b.customer_type, sum(a.amount) total_amount from
>> customer_activity a,
>> account b
>> where
>> a.cust_id = b.cust_id
>> group by b.customer_type
>>
>> CastException: java.math.BigDecimal cannot be cast to
>> org.apache.spark.sql.types.Decimal
>>
>> at
>> org.apache.spark.sql.types.Decimal$DecimalIsFractional$.plus(Decimal.scala:330)
>>
>> at
>> org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
>>
>> at
>> org.apache.spark.sql.catalyst.expressions.Coalesce.eval(nullFunctions.scala:50)
>>
>> at
>> org.apache.spark.sql.catalyst.expressions.MutableLiteral.update(literals.scala:83)
>>
>> at
>> org.apache.spark.sql.catalyst.expressions.SumFunction.update(aggregates.scala:571)
>>
>> at
>> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:163)
>>
>> at
>> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:147)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>
>> at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>
>

Re: Spark SQL JDBC Source Join Error

2015-06-14 Thread Michael Armbrust

Sounds like SPARK-5456 .
Which is fixed in Spark 1.4.

On Sun, Jun 14, 2015 at 11:57 AM, Sathish Kumaran Vairavelu <
vsathishkuma...@gmail.com> wrote:

> Hello Everyone,
>
> I pulled 2 different tables from the JDBC source and then joined them
> using the cust_id *decimal* column. A simple join like as below. This
> simple join works perfectly in the database but not in Spark SQL. I am
> importing 2 tables as a data frame/registertemptable and firing sql on top
> of it. Please let me know what could be the error..
>
> select b.customer_type, sum(a.amount) total_amount from
> customer_activity a,
> account b
> where
> a.cust_id = b.cust_id
> group by b.customer_type
>
> CastException: java.math.BigDecimal cannot be cast to
> org.apache.spark.sql.types.Decimal
>
> at
> org.apache.spark.sql.types.Decimal$DecimalIsFractional$.plus(Decimal.scala:330)
>
> at
> org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
>
> at
> org.apache.spark.sql.catalyst.expressions.Coalesce.eval(nullFunctions.scala:50)
>
> at
> org.apache.spark.sql.catalyst.expressions.MutableLiteral.update(literals.scala:83)
>
> at
> org.apache.spark.sql.catalyst.expressions.SumFunction.update(aggregates.scala:571)
>
> at
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:163)
>
> at
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:147)
>
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
>
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>

Spark SQL JDBC Source Join Error

2015-06-14 Thread Sathish Kumaran Vairavelu

Hello Everyone,

I pulled 2 different tables from the JDBC source and then joined them using
the cust_id *decimal* column. A simple join like as below. This simple join
works perfectly in the database but not in Spark SQL. I am importing 2
tables as a data frame/registertemptable and firing sql on top of it.
Please let me know what could be the error..

select b.customer_type, sum(a.amount) total_amount from
customer_activity a,
account b
where
a.cust_id = b.cust_id
group by b.customer_type

CastException: java.math.BigDecimal cannot be cast to
org.apache.spark.sql.types.Decimal

at
org.apache.spark.sql.types.Decimal$DecimalIsFractional$.plus(Decimal.scala:330)

at
org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)

at
org.apache.spark.sql.catalyst.expressions.Coalesce.eval(nullFunctions.scala:50)

at
org.apache.spark.sql.catalyst.expressions.MutableLiteral.update(literals.scala:83)

at
org.apache.spark.sql.catalyst.expressions.SumFunction.update(aggregates.scala:571)

at
org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:163)

at
org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:147)

at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)

at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)

at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)

at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)

at org.apache.spark.scheduler.Task.run(Task.scala:64)

at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

RE: Anybody using Spark SQL JDBC server with DSE Cassandra?

2015-06-04 Thread Mohammed Guller

Deenar,
Thanks for the suggestion.
That is one of the ideas that I have, but didn’t get chance to try it out yet. 
One of the things that could potentially cause problems is that we use wide 
rows. In addition, the schema is dynamic, with new columns getting added on a 
regular basis. That is why I am considering DSE, which has integrated Spark SQL 
Thrift/JDBC server with Cassandra.

Mohammed

From: Deenar Toraskar [mailto:deenar.toras...@gmail.com]
Sent: Thursday, June 4, 2015 7:42 AM
To: Mohammed Guller
Cc: user@spark.apache.org
Subject: Re: Anybody using Spark SQL JDBC server with DSE Cassandra?

Mohammed

Have you tried registering your Cassandra tables in Hive/Spark SQL using the 
data frames API. These should be then available to query via the Spark 
SQL/Thrift JDBC Server.

Deenar

On 1 June 2015 at 19:33, Mohammed Guller 
mailto:moham...@glassbeam.com>> wrote:
Nobody using Spark SQL JDBC/Thrift server with DSE Cassandra?

Mohammed

From: Mohammed Guller 
[mailto:moham...@glassbeam.com<mailto:moham...@glassbeam.com>]
Sent: Friday, May 29, 2015 11:49 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Anybody using Spark SQL JDBC server with DSE Cassandra?

Hi –

We have successfully integrated Spark SQL with Cassandra. We have a backend 
that provides a REST API that allows users to execute SQL queries on data in 
C*. Now we would like to also support JDBC/ODBC connectivity , so that user can 
use tools like Tableau to query data in C* through the Spark SQL JDBC server.

However, I have been unable to find a driver that would allow the Spark SQL 
Thrift/JDBC server to connect with Cassandra. DataStax provides a closed-source 
driver that comes only with the DSE version of Cassandra.

I would like to find out how many people are using the Spark SQL JDBC server + 
DSE Cassandra combination. If you do use Spark SQL JDBC server + DSE, I would 
appreciate if you could share your experience. For example, what kind of issues 
you have run into? How is the performance? What reporting tools you are using?

Thank  you.

Mohammed

Re: Anybody using Spark SQL JDBC server with DSE Cassandra?

2015-06-04 Thread Deenar Toraskar

Mohammed

Have you tried registering your Cassandra tables in Hive/Spark SQL using
the data frames API. These should be then available to query via the Spark
SQL/Thrift JDBC Server.

Deenar

On 1 June 2015 at 19:33, Mohammed Guller  wrote:

>  Nobody using Spark SQL JDBC/Thrift server with DSE Cassandra?
>
>
>
> Mohammed
>
>
>
> *From:* Mohammed Guller [mailto:moham...@glassbeam.com]
> *Sent:* Friday, May 29, 2015 11:49 AM
> *To:* user@spark.apache.org
> *Subject:* Anybody using Spark SQL JDBC server with DSE Cassandra?
>
>
>
> Hi –
>
>
>
> We have successfully integrated Spark SQL with Cassandra. We have a
> backend that provides a REST API that allows users to execute SQL queries
> on data in C*. Now we would like to also support JDBC/ODBC connectivity ,
> so that user can use tools like Tableau to query data in C* through the
> Spark SQL JDBC server.
>
>
>
> However, I have been unable to find a driver that would allow the Spark
> SQL Thrift/JDBC server to connect with Cassandra. DataStax provides a
> closed-source driver that comes only with the DSE version of Cassandra.
>
>
>
> I would like to find out how many people are using the Spark SQL JDBC
> server + DSE Cassandra combination. If you do use Spark SQL JDBC server +
> DSE, I would appreciate if you could share your experience. For example,
> what kind of issues you have run into? How is the performance? What
> reporting tools you are using?
>
>
>
> Thank  you.
>
>
>
> Mohammed
>
>
>

RE: Anybody using Spark SQL JDBC server with DSE Cassandra?

2015-06-01 Thread Mohammed Guller

Nobody using Spark SQL JDBC/Thrift server with DSE Cassandra?

Mohammed

From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Friday, May 29, 2015 11:49 AM
To: user@spark.apache.org
Subject: Anybody using Spark SQL JDBC server with DSE Cassandra?

Hi -

We have successfully integrated Spark SQL with Cassandra. We have a backend 
that provides a REST API that allows users to execute SQL queries on data in 
C*. Now we would like to also support JDBC/ODBC connectivity , so that user can 
use tools like Tableau to query data in C* through the Spark SQL JDBC server.

However, I have been unable to find a driver that would allow the Spark SQL 
Thrift/JDBC server to connect with Cassandra. DataStax provides a closed-source 
driver that comes only with the DSE version of Cassandra.

I would like to find out how many people are using the Spark SQL JDBC server + 
DSE Cassandra combination. If you do use Spark SQL JDBC server + DSE, I would 
appreciate if you could share your experience. For example, what kind of issues 
you have run into? How is the performance? What reporting tools you are using?

Thank  you.

Mohammed

Anybody using Spark SQL JDBC server with DSE Cassandra?

2015-05-29 Thread Mohammed Guller

Hi -

We have successfully integrated Spark SQL with Cassandra. We have a backend 
that provides a REST API that allows users to execute SQL queries on data in 
C*. Now we would like to also support JDBC/ODBC connectivity , so that user can 
use tools like Tableau to query data in C* through the Spark SQL JDBC server.

However, I have been unable to find a driver that would allow the Spark SQL 
Thrift/JDBC server to connect with Cassandra. DataStax provides a closed-source 
driver that comes only with the DSE version of Cassandra.

I would like to find out how many people are using the Spark SQL JDBC server + 
DSE Cassandra combination. If you do use Spark SQL JDBC server + DSE, I would 
appreciate if you could share your experience. For example, what kind of issues 
you have run into? How is the performance? What reporting tools you are using?

Thank  you.

Mohammed

Re: What's the advantage features of Spark SQL(JDBC)

2015-05-15 Thread Yi Zhang

OK. Thanks. 


 On Friday, May 15, 2015 3:35 PM, "Cheng, Hao"  wrote:
   

 #yiv2190097982 #yiv2190097982 -- _filtered #yiv2190097982 
{font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv2190097982 
{font-family:宋体;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv2190097982 
{panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv2190097982 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv2190097982 
{panose-1:2 1 6 0 3 1 1 1 1 1;}#yiv2190097982 #yiv2190097982 
p.yiv2190097982MsoNormal, #yiv2190097982 li.yiv2190097982MsoNormal, 
#yiv2190097982 div.yiv2190097982MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv2190097982 a:link, 
#yiv2190097982 span.yiv2190097982MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv2190097982 a:visited, #yiv2190097982 
span.yiv2190097982MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv2190097982 
p.yiv2190097982msonormal, #yiv2190097982 li.yiv2190097982msonormal, 
#yiv2190097982 div.yiv2190097982msonormal 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv2190097982 
p.yiv2190097982msochpdefault, #yiv2190097982 li.yiv2190097982msochpdefault, 
#yiv2190097982 div.yiv2190097982msochpdefault 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv2190097982 
span.yiv2190097982msohyperlink {}#yiv2190097982 
span.yiv2190097982msohyperlinkfollowed {}#yiv2190097982 
span.yiv2190097982emailstyle17 {}#yiv2190097982 p.yiv2190097982msonormal1, 
#yiv2190097982 li.yiv2190097982msonormal1, #yiv2190097982 
div.yiv2190097982msonormal1 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv2190097982 
span.yiv2190097982msohyperlink1 
{color:#0563C1;text-decoration:underline;}#yiv2190097982 
span.yiv2190097982msohyperlinkfollowed1 
{color:#954F72;text-decoration:underline;}#yiv2190097982 
span.yiv2190097982emailstyle171 {color:#1F497D;}#yiv2190097982 
p.yiv2190097982msochpdefault1, #yiv2190097982 li.yiv2190097982msochpdefault1, 
#yiv2190097982 div.yiv2190097982msochpdefault1 
{margin-right:0cm;margin-left:0cm;font-size:10.0pt;}#yiv2190097982 
span.yiv2190097982EmailStyle27 {color:#1F497D;}#yiv2190097982 
.yiv2190097982MsoChpDefault {font-size:10.0pt;} _filtered #yiv2190097982 
{margin:72.0pt 90.0pt 72.0pt 90.0pt;}#yiv2190097982 
div.yiv2190097982WordSection1 {}#yiv2190097982 Yes.    From: Yi Zhang 
[mailto:zhangy...@yahoo.com]
Sent: Friday, May 15, 2015 2:51 PM
To: Cheng, Hao; User
Subject: Re: What's the advantage features of Spark SQL(JDBC)    @Hao, As you 
said, there is no advantage feature for JDBC, it just provides unified api to 
support different data sources. Is it right?       On Friday, May 15, 2015 2:46 
PM, "Cheng, Hao"  wrote:    Spark SQL just take the JDBC 
as a new data source, the same as we need to support loading data from a .csv 
or .json.   From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID]
Sent: Friday, May 15, 2015 2:30 PM
To: User
Subject: What's the advantage features of Spark SQL(JDBC)   Hi All,   Comparing 
direct access via JDBC, what's the advantage features of Spark SQL(JDBC) to 
access external data source?   Any tips are welcome! Thanks.   Regards, Yi

RE: What's the advantage features of Spark SQL(JDBC)

2015-05-15 Thread Cheng, Hao

Yes.

From: Yi Zhang [mailto:zhangy...@yahoo.com]
Sent: Friday, May 15, 2015 2:51 PM
To: Cheng, Hao; User
Subject: Re: What's the advantage features of Spark SQL(JDBC)

@Hao,
As you said, there is no advantage feature for JDBC, it just provides unified 
api to support different data sources. Is it right?

On Friday, May 15, 2015 2:46 PM, "Cheng, Hao" 
mailto:hao.ch...@intel.com>> wrote:

Spark SQL just take the JDBC as a new data source, the same as we need to 
support loading data from a .csv or .json.

From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID]
Sent: Friday, May 15, 2015 2:30 PM
To: User
Subject: What's the advantage features of Spark SQL(JDBC)

Hi All,

Comparing direct access via JDBC, what's the advantage features of Spark 
SQL(JDBC) to access external data source?

Any tips are welcome! Thanks.

Regards,
Yi

Re: What's the advantage features of Spark SQL(JDBC)

2015-05-14 Thread Yi Zhang

@Hao,As you said, there is no advantage feature for JDBC, it just provides 
unified api to support different data sources. Is it right? 


 On Friday, May 15, 2015 2:46 PM, "Cheng, Hao"  wrote:
   

 #yiv2822675239 #yiv2822675239 -- _filtered #yiv2822675239 
{font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv2822675239 
{font-family:宋体;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv2822675239 
{panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv2822675239 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv2822675239 
{panose-1:2 1 6 0 3 1 1 1 1 1;}#yiv2822675239 #yiv2822675239 
p.yiv2822675239MsoNormal, #yiv2822675239 li.yiv2822675239MsoNormal, 
#yiv2822675239 div.yiv2822675239MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv2822675239 a:link, 
#yiv2822675239 span.yiv2822675239MsoHyperlink 
{color:#0563C1;text-decoration:underline;}#yiv2822675239 a:visited, 
#yiv2822675239 span.yiv2822675239MsoHyperlinkFollowed 
{color:#954F72;text-decoration:underline;}#yiv2822675239 
span.yiv2822675239EmailStyle17 {color:#1F497D;}#yiv2822675239 
.yiv2822675239MsoChpDefault {font-size:10.0pt;} _filtered #yiv2822675239 
{margin:72.0pt 90.0pt 72.0pt 90.0pt;}#yiv2822675239 
div.yiv2822675239WordSection1 {}#yiv2822675239 Spark SQL just take the JDBC as 
a new data source, the same as we need to support loading data from a .csv or 
.json.    From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID]
Sent: Friday, May 15, 2015 2:30 PM
To: User
Subject: What's the advantage features of Spark SQL(JDBC)    Hi All,    
Comparing direct access via JDBC, what's the advantage features of Spark 
SQL(JDBC) to access external data source?    Any tips are welcome! Thanks.    
Regards, Yi

RE: What's the advantage features of Spark SQL(JDBC)

2015-05-14 Thread Cheng, Hao

Spark SQL just take the JDBC as a new data source, the same as we need to 
support loading data from a .csv or .json.

From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID]
Sent: Friday, May 15, 2015 2:30 PM
To: User
Subject: What's the advantage features of Spark SQL(JDBC)

Hi All,

Comparing direct access via JDBC, what's the advantage features of Spark 
SQL(JDBC) to access external data source?

Any tips are welcome! Thanks.

Regards,
Yi

What's the advantage features of Spark SQL(JDBC)

2015-05-14 Thread Yi Zhang

Hi All,
Comparing direct access via JDBC, what's the advantage features of Spark 
SQL(JDBC) to access external data source?

Any tips are welcome! Thanks.
Regards,Yi

Re: Spark-SQL JDBC driver

2014-12-14 Thread Michael Armbrust

I'll add that there is an experimental method that allows you to start the
JDBC server with an existing HiveContext (which might have registered
temporary tables).

https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42


On Thu, Dec 11, 2014 at 6:52 AM, Denny Lee  wrote:
>
> Yes, that is correct. A quick reference on this is the post
> https://www.linkedin.com/pulse/20141007143323-732459-an-absolutely-unofficial-way-to-connect-tableau-to-sparksql-spark-1-1?_mSplash=1
> with the pertinent section being:
>
> It is important to note that when you create Spark tables (for example,
> via the .registerTempTable) these are operating within the Spark
> environment which resides in a separate process than the Hive Metastore.
> This means that currently tables that are created within the Spark context
> are not available through the Thrift server. To achieve this, within the
> Spark context save your temporary table into Hive - then the Spark Thrift
> Server will be able to see the table.
>
> HTH!
>
>
> On Thu, Dec 11, 2014 at 04:09 Anas Mosaad  wrote:
>
>> Actually I came to a conclusion that RDDs has to be persisted in hive in
>> order to be able to access through thrift.
>> Hope I didn't end up with incorrect conclusion.
>> Please someone correct me if I am wrong.
>> On Dec 11, 2014 8:53 AM, "Judy Nash" 
>> wrote:
>>
>>>  Looks like you are wondering why you cannot see the RDD table you have
>>> created via thrift?
>>>
>>>
>>>
>>> Based on my own experience with spark 1.1, RDD created directly via
>>> Spark SQL (i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift,
>>> since thrift has its own session containing its own RDD.
>>>
>>> Spark SQL experts on the forum can confirm on this though.
>>>
>>>
>>>
>>> *From:* Cheng Lian [mailto:lian.cs@gmail.com]
>>> *Sent:* Tuesday, December 9, 2014 6:42 AM
>>> *To:* Anas Mosaad
>>> *Cc:* Judy Nash; user@spark.apache.org
>>> *Subject:* Re: Spark-SQL JDBC driver
>>>
>>>
>>>
>>> According to the stacktrace, you were still using SQLContext rather than
>>> HiveContext. To interact with Hive, HiveContext *must* be used.
>>>
>>> Please refer to this page
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
>>>
>>>  On 12/9/14 6:26 PM, Anas Mosaad wrote:
>>>
>>>  Back to the first question, this will mandate that hive is up and
>>> running?
>>>
>>>
>>>
>>> When I try it, I get the following exception. The documentation says
>>> that this method works only on SchemaRDD. I though that
>>> countries.saveAsTable did not work for that a reason so I created a tmp
>>> that contains the results from the registered temp table. Which I could
>>> validate that it's a SchemaRDD as shown below.
>>>
>>>
>>>
>>>
>>> * @Judy,* I do really appreciate your kind support and I want to
>>> understand and off course don't want to wast your time. If you can direct
>>> me the documentation describing this details, this will be great.
>>>
>>>
>>>
>>> scala> val tmp = sqlContext.sql("select * from countries")
>>>
>>> tmp: org.apache.spark.sql.SchemaRDD =
>>>
>>> SchemaRDD[12] at RDD at SchemaRDD.scala:108
>>>
>>> == Query Plan ==
>>>
>>> == Physical Plan ==
>>>
>>> PhysicalRDD
>>> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
>>> MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36
>>>
>>>
>>>
>>> scala> tmp.saveAsTable("Countries")
>>>
>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
>>> Unresolved plan found, tree:
>>>
>>> 'CreateTableAsSelect None, Countries, false, None
>>>
>>>  Project
>>> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]
>>>
>>>   Subquery countries
>>>
>>>LogicalRDD
>>> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,CO

Re: Spark-SQL JDBC driver

2014-12-11 Thread Denny Lee

Yes, that is correct. A quick reference on this is the post
https://www.linkedin.com/pulse/20141007143323-732459-an-absolutely-unofficial-way-to-connect-tableau-to-sparksql-spark-1-1?_mSplash=1
with the pertinent section being:

It is important to note that when you create Spark tables (for example, via
the .registerTempTable) these are operating within the Spark environment
which resides in a separate process than the Hive Metastore. This means
that currently tables that are created within the Spark context are not
available through the Thrift server. To achieve this, within the Spark
context save your temporary table into Hive - then the Spark Thrift Server
will be able to see the table.

HTH!

On Thu, Dec 11, 2014 at 04:09 Anas Mosaad  wrote:

> Actually I came to a conclusion that RDDs has to be persisted in hive in
> order to be able to access through thrift.
> Hope I didn't end up with incorrect conclusion.
> Please someone correct me if I am wrong.
> On Dec 11, 2014 8:53 AM, "Judy Nash" 
> wrote:
>
>>  Looks like you are wondering why you cannot see the RDD table you have
>> created via thrift?
>>
>>
>>
>> Based on my own experience with spark 1.1, RDD created directly via Spark
>> SQL (i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift, since
>> thrift has its own session containing its own RDD.
>>
>> Spark SQL experts on the forum can confirm on this though.
>>
>>
>>
>> *From:* Cheng Lian [mailto:lian.cs@gmail.com]
>> *Sent:* Tuesday, December 9, 2014 6:42 AM
>> *To:* Anas Mosaad
>> *Cc:* Judy Nash; user@spark.apache.org
>> *Subject:* Re: Spark-SQL JDBC driver
>>
>>
>>
>> According to the stacktrace, you were still using SQLContext rather than
>> HiveContext. To interact with Hive, HiveContext *must* be used.
>>
>> Please refer to this page
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
>>
>>  On 12/9/14 6:26 PM, Anas Mosaad wrote:
>>
>>  Back to the first question, this will mandate that hive is up and
>> running?
>>
>>
>>
>> When I try it, I get the following exception. The documentation says that
>> this method works only on SchemaRDD. I though that countries.saveAsTable
>> did not work for that a reason so I created a tmp that contains the results
>> from the registered temp table. Which I could validate that it's a
>> SchemaRDD as shown below.
>>
>>
>>
>>
>> * @Judy,* I do really appreciate your kind support and I want to
>> understand and off course don't want to wast your time. If you can direct
>> me the documentation describing this details, this will be great.
>>
>>
>>
>> scala> val tmp = sqlContext.sql("select * from countries")
>>
>> tmp: org.apache.spark.sql.SchemaRDD =
>>
>> SchemaRDD[12] at RDD at SchemaRDD.scala:108
>>
>> == Query Plan ==
>>
>> == Physical Plan ==
>>
>> PhysicalRDD
>> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
>> MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36
>>
>>
>>
>> scala> tmp.saveAsTable("Countries")
>>
>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
>> Unresolved plan found, tree:
>>
>> 'CreateTableAsSelect None, Countries, false, None
>>
>>  Project
>> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]
>>
>>   Subquery countries
>>
>>LogicalRDD
>> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
>> MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36
>>
>>
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)
>>
>> at
>

RE: Spark-SQL JDBC driver

2014-12-11 Thread Anas Mosaad

Actually I came to a conclusion that RDDs has to be persisted in hive in
order to be able to access through thrift.
Hope I didn't end up with incorrect conclusion.
Please someone correct me if I am wrong.
On Dec 11, 2014 8:53 AM, "Judy Nash" 
wrote:

>  Looks like you are wondering why you cannot see the RDD table you have
> created via thrift?
>
>
>
> Based on my own experience with spark 1.1, RDD created directly via Spark
> SQL (i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift, since
> thrift has its own session containing its own RDD.
>
> Spark SQL experts on the forum can confirm on this though.
>
>
>
> *From:* Cheng Lian [mailto:lian.cs@gmail.com]
> *Sent:* Tuesday, December 9, 2014 6:42 AM
> *To:* Anas Mosaad
> *Cc:* Judy Nash; user@spark.apache.org
> *Subject:* Re: Spark-SQL JDBC driver
>
>
>
> According to the stacktrace, you were still using SQLContext rather than
> HiveContext. To interact with Hive, HiveContext *must* be used.
>
> Please refer to this page
> http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
>
>  On 12/9/14 6:26 PM, Anas Mosaad wrote:
>
>  Back to the first question, this will mandate that hive is up and
> running?
>
>
>
> When I try it, I get the following exception. The documentation says that
> this method works only on SchemaRDD. I though that countries.saveAsTable
> did not work for that a reason so I created a tmp that contains the results
> from the registered temp table. Which I could validate that it's a
> SchemaRDD as shown below.
>
>
>
>
> * @Judy,* I do really appreciate your kind support and I want to
> understand and off course don't want to wast your time. If you can direct
> me the documentation describing this details, this will be great.
>
>
>
> scala> val tmp = sqlContext.sql("select * from countries")
>
> tmp: org.apache.spark.sql.SchemaRDD =
>
> SchemaRDD[12] at RDD at SchemaRDD.scala:108
>
> == Query Plan ==
>
> == Physical Plan ==
>
> PhysicalRDD
> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
> MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36
>
>
>
> scala> tmp.saveAsTable("Countries")
>
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> plan found, tree:
>
> 'CreateTableAsSelect None, Countries, false, None
>
>  Project
> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]
>
>   Subquery countries
>
>LogicalRDD
> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
> MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36
>
>
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>
> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>
> at scala.collection.immutable.List.foreach(List.scala:318)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
>
> at
> org.apache.spark.sql.SQLCon

RE: Spark-SQL JDBC driver

2014-12-10 Thread Judy Nash

Looks like you are wondering why you cannot see the RDD table you have created 
via thrift?

Based on my own experience with spark 1.1, RDD created directly via Spark SQL 
(i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift, since thrift has 
its own session containing its own RDD.
Spark SQL experts on the forum can confirm on this though.

From: Cheng Lian [mailto:lian.cs@gmail.com]
Sent: Tuesday, December 9, 2014 6:42 AM
To: Anas Mosaad
Cc: Judy Nash; user@spark.apache.org
Subject: Re: Spark-SQL JDBC driver

According to the stacktrace, you were still using SQLContext rather than 
HiveContext. To interact with Hive, HiveContext *must* be used.

Please refer to this page 
http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

On 12/9/14 6:26 PM, Anas Mosaad wrote:
Back to the first question, this will mandate that hive is up and running?

When I try it, I get the following exception. The documentation says that this 
method works only on SchemaRDD. I though that countries.saveAsTable did not 
work for that a reason so I created a tmp that contains the results from the 
registered temp table. Which I could validate that it's a SchemaRDD as shown 
below.


@Judy, I do really appreciate your kind support and I want to understand and 
off course don't want to wast your time. If you can direct me the documentation 
describing this details, this will be great.


scala> val tmp = sqlContext.sql("select * from countries")

tmp: org.apache.spark.sql.SchemaRDD =

SchemaRDD[12] at RDD at SchemaRDD.scala:108

== Query Plan ==

== Physical Plan ==

PhysicalRDD 
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
 MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



scala> tmp.saveAsTable("Countries")

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved plan 
found, tree:

'CreateTableAsSelect None, Countries, false, None

 Project 
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]

  Subquery countries

   LogicalRDD 
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
 MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)

at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)

at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)

at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)

at scala.collection.immutable.List.foreach(List.scala:318)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)

at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)

at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)

at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)

at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)

at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)

at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)

at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)

at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)

at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)

at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)

at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala

Re: Spark-SQL JDBC driver

2014-12-09 Thread Cheng Lian

n.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)


at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)


at java.lang.reflect.Method.invoke(Method.java:606)

at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)


at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)


at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)

at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)


at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)


at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)

at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)

at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)

at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)

at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)


at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)


at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)


at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)


at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)

at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)

at org.apache.spark.repl.Main$.main(Main.scala:31)

at org.apache.spark.repl.Main.main(Main.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)


at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)


at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:365)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)




On Tue, Dec 9, 2014 at 11:44 AM, Cheng Lian <mailto:lian.cs@gmail.com>> wrote:


How did you register the table under spark-shell? Two things to
notice:

1. To interact with Hive, HiveContext instead of SQLContext must
be used.
2. `registerTempTable` doesn't persist the table into Hive
metastore, and the table is lost after quitting spark-shell.
Instead, you must use `saveAsTable`.


On 12/9/14 5:27 PM, Anas Mosaad wrote:

Thanks Cheng,

I thought spark-sql is using the same exact metastore, right?
However, it didn't work as expected. Here's what I did.

In spark-shell, I loaded a csv files and registered the table,
say countries.
Started the thrift server.
Connected using beeline. When I run show tables or !tables, I get
empty list of tables as follow:

/0: jdbc:hive2://localhost:1> !tables/

/++--+-+-+--+/

/| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE |
REMARKS  |/

/++--+-+-+--+/

/++--+-+-+--+/

/0: jdbc:hive2://localhost:1> show tables ;/

/+-+/

/| result  |/

/+-+/

/+-+/

/No rows selected (0.106 seconds)/

/0: jdbc:hive2://localhost:1> /



Kindly advice, what am I missing? I want to read the RDD using
SQL from outside spark-shell (i.e. like any other relational
database)


On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian
    mailto:lian.cs@gmail.com>> wrote:

Essentially, the Spark SQL JDBC Thrift server is just a Spark
port of HiveServer2. You don't need to run Hive, but you do
need a working Metastore.


On 12/9/14 3:59 PM, Anas Mosaad wrote:

Thanks Judy, this is exactly what I'm looking for. However,
and plz forgive me if it's a dump question is: It seems to
me that thrift is the same as hive2 JDBC driver, does this
mean that starting thrift will start hive as well on the server?

On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash
mailto:judyn...@exchange.microsoft.com>> wrote:

You can use thrift server for this purpose then test it
with beeline.

See doc:


https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server

*From:*Anas Mosaad [mailto:anas.mos...@incorta.com
<mailto:anas.mos...@incorta.com>]
*Sent:* Monday, December 8, 2014 11:01 AM
    *To:* user@spark.apache.org <mailto:user@spark.apache.org>
*Subject:* Spark-SQL JDBC driver

Re: Spark-SQL JDBC driver

2014-12-09 Thread Anas Mosaad

est.loadAndRun(SparkIMain.scala:1125)

at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)

at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)

at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)

at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)

at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)

at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)

at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)

at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)

at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)

at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)

at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)

at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)

at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)

at org.apache.spark.repl.Main$.main(Main.scala:31)

at org.apache.spark.repl.Main.main(Main.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:365)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)




On Tue, Dec 9, 2014 at 11:44 AM, Cheng Lian  wrote:

>  How did you register the table under spark-shell? Two things to notice:
>
> 1. To interact with Hive, HiveContext instead of SQLContext must be used.
> 2. `registerTempTable` doesn't persist the table into Hive metastore, and
> the table is lost after quitting spark-shell. Instead, you must use
> `saveAsTable`.
>
>
> On 12/9/14 5:27 PM, Anas Mosaad wrote:
>
> Thanks Cheng,
>
>  I thought spark-sql is using the same exact metastore, right? However,
> it didn't work as expected. Here's what I did.
>
>  In spark-shell, I loaded a csv files and registered the table, say
> countries.
> Started the thrift server.
> Connected using beeline. When I run show tables or !tables, I get empty
> list of tables as follow:
>
>  *0: jdbc:hive2://localhost:1> !tables*
>
> *++--+-+-+--+*
>
> *| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE  | REMARKS  |*
>
> *++--+-+-+--+*
>
> *++--+-+-+--+*
>
> *0: jdbc:hive2://localhost:1> show tables ;*
>
> *+-+*
>
> *| result  |*
>
> *+-+*
>
> *+-+*
>
> *No rows selected (0.106 seconds)*
>
> *0: jdbc:hive2://localhost:1> *
>
>
>
>  Kindly advice, what am I missing? I want to read the RDD using SQL from
> outside spark-shell (i.e. like any other relational database)
>
>
> On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian  wrote:
>
>>  Essentially, the Spark SQL JDBC Thrift server is just a Spark port of
>> HiveServer2. You don't need to run Hive, but you do need a working
>> Metastore.
>>
>>
>> On 12/9/14 3:59 PM, Anas Mosaad wrote:
>>
>> Thanks Judy, this is exactly what I'm looking for. However, and plz
>> forgive me if it's a dump question is: It seems to me that thrift is the
>> same as hive2 JDBC driver, does this mean that starting thrift will start
>> hive as well on the server?
>>
>> On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash <
>> judyn...@exchange.microsoft.com> wrote:
>>
>>>  You can use thrift server for this purpose then test it with beeline.
>>>
>>>
>>>
>>> See doc:
>>>
>>>
>>> https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server
>>>
>>>
>>>
>>>
>>>
>>> *From:* Anas Mosaad [mailto:anas.mos...@incorta.com]
>>> *Sent:* Monday, December 8, 2014 11:01 AM
>>> *To:* user@spark.apache.org
>>> *Subject:* Spark-SQL JDBC driver
>>>
>>>
>>>
>>> Hello Everyone,
>>>
>>>
>>>
>>> I'm brand new to spark and was wondering if there's a JDBC driver to
>>> access spark-SQL directly. I'm running spark in standalone mode and don't
>>> have hadoop in this environment.
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>> *Best Regards/أطيب المنى,*
>>>
>>>
>>>
>>> *Anas Mosaad*
>>>
>>>
>>>
>>
>>
>>
>>  --
>>
>> *Best Regards/أطيب المنى,*
>>
>>  *Anas Mosaad*
>> *Incorta Inc.*
>> *+20-100-743-4510*
>>
>>
>>
>
>
>  --
>
> *Best Regards/أطيب المنى,*
>
>  *Anas Mosaad*
> *Incorta Inc.*
> *+20-100-743-4510*
>
>
>


-- 

*Best Regards/أطيب المنى,*

*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*

Re: Spark-SQL JDBC driver

2014-12-09 Thread Cheng Lian


How did you register the table under spark-shell? Two things to notice:

1. To interact with Hive, HiveContext instead of SQLContext must be used.
2. `registerTempTable` doesn't persist the table into Hive metastore, 
and the table is lost after quitting spark-shell. Instead, you must use 
`saveAsTable`.


On 12/9/14 5:27 PM, Anas Mosaad wrote:

Thanks Cheng,

I thought spark-sql is using the same exact metastore, right? However, 
it didn't work as expected. Here's what I did.


In spark-shell, I loaded a csv files and registered the table, say 
countries.

Started the thrift server.
Connected using beeline. When I run show tables or !tables, I get 
empty list of tables as follow:


/0: jdbc:hive2://localhost:1> !tables/

/++--+-+-+--+/

/| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE  | REMARKS  |/

/++--+-+-+--+/

/++--+-+-+--+/

/0: jdbc:hive2://localhost:1> show tables ;/

/+-+/

/| result  |/

/+-+/

/+-+/

/No rows selected (0.106 seconds)/

/0: jdbc:hive2://localhost:1> /



Kindly advice, what am I missing? I want to read the RDD using SQL 
from outside spark-shell (i.e. like any other relational database)



On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian <mailto:lian.cs@gmail.com>> wrote:


    Essentially, the Spark SQL JDBC Thrift server is just a Spark port
of HiveServer2. You don't need to run Hive, but you do need a
working Metastore.


On 12/9/14 3:59 PM, Anas Mosaad wrote:

Thanks Judy, this is exactly what I'm looking for. However, and
plz forgive me if it's a dump question is: It seems to me that
thrift is the same as hive2 JDBC driver, does this mean that
starting thrift will start hive as well on the server?

On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash
mailto:judyn...@exchange.microsoft.com>> wrote:

You can use thrift server for this purpose then test it with
beeline.

See doc:


https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server

*From:*Anas Mosaad [mailto:anas.mos...@incorta.com
<mailto:anas.mos...@incorta.com>]
*Sent:* Monday, December 8, 2014 11:01 AM
*To:* user@spark.apache.org <mailto:user@spark.apache.org>
*Subject:* Spark-SQL JDBC driver

Hello Everyone,

I'm brand new to spark and was wondering if there's a JDBC
driver to access spark-SQL directly. I'm running spark in
standalone mode and don't have hadoop in this environment.

-- 


*Best Regards/أطيب المنى,*

*Anas Mosaad*




-- 


*Best Regards/أطيب المنى,*
*
*
*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*





--

*Best Regards/أطيب المنى,*
*
*
*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*

Re: Spark-SQL JDBC driver

2014-12-09 Thread Anas Mosaad

Thanks Cheng,

I thought spark-sql is using the same exact metastore, right? However, it
didn't work as expected. Here's what I did.

In spark-shell, I loaded a csv files and registered the table, say
countries.
Started the thrift server.
Connected using beeline. When I run show tables or !tables, I get empty
list of tables as follow:

*0: jdbc:hive2://localhost:1> !tables*

*++--+-+-+--+*

*| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE  | REMARKS  |*

*++--+-+-+--+*

*++--+-+-+--+*

*0: jdbc:hive2://localhost:1> show tables ;*

*+-+*

*| result  |*

*+-+*

*+-+*

*No rows selected (0.106 seconds)*

*0: jdbc:hive2://localhost:1> *



Kindly advice, what am I missing? I want to read the RDD using SQL from
outside spark-shell (i.e. like any other relational database)


On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian  wrote:

>  Essentially, the Spark SQL JDBC Thrift server is just a Spark port of
> HiveServer2. You don't need to run Hive, but you do need a working
> Metastore.
>
>
> On 12/9/14 3:59 PM, Anas Mosaad wrote:
>
> Thanks Judy, this is exactly what I'm looking for. However, and plz
> forgive me if it's a dump question is: It seems to me that thrift is the
> same as hive2 JDBC driver, does this mean that starting thrift will start
> hive as well on the server?
>
> On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash  > wrote:
>
>>  You can use thrift server for this purpose then test it with beeline.
>>
>>
>>
>> See doc:
>>
>>
>> https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server
>>
>>
>>
>>
>>
>> *From:* Anas Mosaad [mailto:anas.mos...@incorta.com]
>> *Sent:* Monday, December 8, 2014 11:01 AM
>> *To:* user@spark.apache.org
>> *Subject:* Spark-SQL JDBC driver
>>
>>
>>
>> Hello Everyone,
>>
>>
>>
>> I'm brand new to spark and was wondering if there's a JDBC driver to
>> access spark-SQL directly. I'm running spark in standalone mode and don't
>> have hadoop in this environment.
>>
>>
>>
>> --
>>
>>
>>
>> *Best Regards/أطيب المنى,*
>>
>>
>>
>> *Anas Mosaad*
>>
>>
>>
>
>
>
>  --
>
> *Best Regards/أطيب المنى,*
>
>  *Anas Mosaad*
> *Incorta Inc.*
> *+20-100-743-4510*
>
>
>


-- 

*Best Regards/أطيب المنى,*

*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*

Re: Spark-SQL JDBC driver

2014-12-09 Thread Cheng Lian

Essentially, the Spark SQL JDBC Thrift server is just a Spark port of 
HiveServer2. You don't need to run Hive, but you do need a working 
Metastore.


On 12/9/14 3:59 PM, Anas Mosaad wrote:
Thanks Judy, this is exactly what I'm looking for. However, and plz 
forgive me if it's a dump question is: It seems to me that thrift is 
the same as hive2 JDBC driver, does this mean that starting thrift 
will start hive as well on the server?


On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash 
<mailto:judyn...@exchange.microsoft.com>> wrote:


You can use thrift server for this purpose then test it with beeline.

See doc:


https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server

*From:*Anas Mosaad [mailto:anas.mos...@incorta.com
<mailto:anas.mos...@incorta.com>]
*Sent:* Monday, December 8, 2014 11:01 AM
*To:* user@spark.apache.org <mailto:user@spark.apache.org>
*Subject:* Spark-SQL JDBC driver

Hello Everyone,

I'm brand new to spark and was wondering if there's a JDBC driver
to access spark-SQL directly. I'm running spark in standalone mode
and don't have hadoop in this environment.

-- 


*Best Regards/أطيب المنى,*

*Anas Mosaad*




--

*Best Regards/أطيب المنى,*
*
*
*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*

Re: Spark-SQL JDBC driver

2014-12-09 Thread Anas Mosaad

Thanks Judy, this is exactly what I'm looking for. However, and plz forgive
me if it's a dump question is: It seems to me that thrift is the same as
hive2 JDBC driver, does this mean that starting thrift will start hive as
well on the server?

On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash 
wrote:

>  You can use thrift server for this purpose then test it with beeline.
>
>
>
> See doc:
>
>
> https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server
>
>
>
>
>
> *From:* Anas Mosaad [mailto:anas.mos...@incorta.com]
> *Sent:* Monday, December 8, 2014 11:01 AM
> *To:* user@spark.apache.org
> *Subject:* Spark-SQL JDBC driver
>
>
>
> Hello Everyone,
>
>
>
> I'm brand new to spark and was wondering if there's a JDBC driver to
> access spark-SQL directly. I'm running spark in standalone mode and don't
> have hadoop in this environment.
>
>
>
> --
>
>
>
> *Best Regards/أطيب المنى,*
>
>
>
> *Anas Mosaad*
>
>
>



-- 

*Best Regards/أطيب المنى,*

*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*

RE: Spark-SQL JDBC driver

2014-12-08 Thread Judy Nash

You can use thrift server for this purpose then test it with beeline.

See doc:
https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server

From: Anas Mosaad [mailto:anas.mos...@incorta.com]
Sent: Monday, December 8, 2014 11:01 AM
To: user@spark.apache.org
Subject: Spark-SQL JDBC driver

Hello Everyone,

I'm brand new to spark and was wondering if there's a JDBC driver to access 
spark-SQL directly. I'm running spark in standalone mode and don't have hadoop 
in this environment.

--

Best Regards/أطيب المنى,

Anas Mosaad

Spark-SQL JDBC driver

2014-12-08 Thread Anas Mosaad

Hello Everyone,

I'm brand new to spark and was wondering if there's a JDBC driver to access
spark-SQL directly. I'm running spark in standalone mode and don't have
hadoop in this environment.

-- 

*Best Regards/أطيب المنى,*

*Anas Mosaad*

RE: Spark SQL JDBC

2014-09-11 Thread Cheng, Hao

I copied the 3 datanucleus jars (datanucleus-api-jdo-3.2.1.jar, 
datanucleus-core-3.2.2.jar, datanucleus-rdbms-3.2.1.jar) to the fold lib/ 
manually, and it works for me.

From: Denny Lee [mailto:denny.g@gmail.com]
Sent: Friday, September 12, 2014 11:28 AM
To: alexandria1101
Cc: u...@spark.incubator.apache.org
Subject: Re: Spark SQL JDBC

When you re-ran sbt did you clear out the packages first and ensure that the 
datanucleus jars were generated within lib_managed?  I remembered having to do 
that when I was working testing out different configs.

On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101 
mailto:alexandria.shea...@gmail.com>> wrote:
Even when I comment out those 3 lines, I still get the same error.  Did
someone solve this?

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-tp11369p13992.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

Re: Spark SQL JDBC

2014-09-11 Thread Denny Lee

When you re-ran sbt did you clear out the packages first and ensure that
the datanucleus jars were generated within lib_managed?  I remembered
having to do that when I was working testing out different configs.

On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101 <
alexandria.shea...@gmail.com> wrote:

> Even when I comment out those 3 lines, I still get the same error.  Did
> someone solve this?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-tp11369p13992.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Spark SQL JDBC

2014-09-11 Thread alexandria1101

Even when I comment out those 3 lines, I still get the same error.  Did
someone solve this?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-tp11369p13992.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark SQL JDBC

2014-08-12 Thread Cheng Lian

Oh, thanks for reporting this. This should be a bug since SPARK_HIVE was
deprecated, we shouldn’t rely on it any more.



On Wed, Aug 13, 2014 at 1:23 PM, ZHENG, Xu-dong  wrote:

> Just find this is because below lines in make_distribution.sh doesn't work:
>
> if [ "$SPARK_HIVE" == "true" ]; then
>   cp "$FWDIR"/lib_managed/jars/datanucleus*.jar "$DISTDIR/lib/"
> fi
>
> There is no definition of $SPARK_HIVE in make_distribution.sh. I should
> set it explicitly.
>
>
>
> On Wed, Aug 13, 2014 at 1:10 PM, ZHENG, Xu-dong  wrote:
>
>> Hi Cheng,
>>
>> I also meet some issues when I try to start ThriftServer based a build
>> from master branch (I could successfully run it from the branch-1.0-jdbc
>> branch). Below is my build command:
>>
>> ./make-distribution.sh --skip-java-test -Phadoop-2.4 -Phive -Pyarn
>> -Dyarn.version=2.4.0 -Dhadoop.version=2.4.0 -Phive-thriftserver
>>
>> And below are the printed errors:
>>
>> ERROR CompositeService: Error starting services HiveServer2
>> org.apache.hive.service.ServiceException: Unable to connect to MetaStore!
>> at
>> org.apache.hive.service.cli.CLIService.start(CLIService.java:85)
>> at
>> org.apache.hive.service.CompositeService.start(CompositeService.java:70)
>> at
>> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:73)
>> at
>> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:71)
>> at
>> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:314)
>>  at
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> Caused by: javax.jdo.JDOFatalUserException: Class
>> org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
>> NestedThrowables:
>> java.lang.ClassNotFoundException:
>> org.datanucleus.api.jdo.JDOPersistenceManagerFactory
>> at
>> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
>> at
>> javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
>> at
>> javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:275)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:304)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:234)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:209)
>> at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
>> at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>> at
>> org.apache.hadoop.hive.metastore.RetryingRawStore.(RetryingRawStore.java:64)
>> at
>> org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:73)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:415)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:402)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:441)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:326)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:286)
>> at
>> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:54)
>> at
>> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4060)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:121)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:104)
>> at
>> org.apache.hive.service.cli.CLIService.start(CLIService.java:82)
>> ... 11 more
>> Caused by: java.lang.ClassNotFoundException:
>> org.datanucleus.api.jdo.JDOPersistenceManagerFactory
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>

Re: Spark SQL JDBC

2014-08-12 Thread ZHENG, Xu-dong

Just find this is because below lines in make_distribution.sh doesn't work:

if [ "$SPARK_HIVE" == "true" ]; then
  cp "$FWDIR"/lib_managed/jars/datanucleus*.jar "$DISTDIR/lib/"
fi

There is no definition of $SPARK_HIVE in make_distribution.sh. I should set
it explicitly.



On Wed, Aug 13, 2014 at 1:10 PM, ZHENG, Xu-dong  wrote:

> Hi Cheng,
>
> I also meet some issues when I try to start ThriftServer based a build
> from master branch (I could successfully run it from the branch-1.0-jdbc
> branch). Below is my build command:
>
> ./make-distribution.sh --skip-java-test -Phadoop-2.4 -Phive -Pyarn
> -Dyarn.version=2.4.0 -Dhadoop.version=2.4.0 -Phive-thriftserver
>
> And below are the printed errors:
>
> ERROR CompositeService: Error starting services HiveServer2
> org.apache.hive.service.ServiceException: Unable to connect to MetaStore!
> at org.apache.hive.service.cli.CLIService.start(CLIService.java:85)
> at
> org.apache.hive.service.CompositeService.start(CompositeService.java:70)
> at
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:73)
> at
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:71)
> at
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:314)
>  at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: javax.jdo.JDOFatalUserException: Class
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
> NestedThrowables:
> java.lang.ClassNotFoundException:
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory
> at
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
> at
> javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
> at
> javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:275)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:304)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:234)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:209)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> at
> org.apache.hadoop.hive.metastore.RetryingRawStore.(RetryingRawStore.java:64)
> at
> org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:73)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:415)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:402)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:441)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:326)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:286)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:54)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4060)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:121)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:104)
> at org.apache.hive.service.cli.CLIService.start(CLIService.java:82)
> ... 11 more
> Caused by: java.lang.ClassNotFoundException:
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at javax.jdo.JDOHelper$18.run(JDOHelper.java:2018)
> at javax.jdo.JDOHelper$18.run(JDOHelper.java:2016)
>

Re: Spark SQL JDBC

2014-08-12 Thread ZHENG, Xu-dong

Hi Cheng,

I also meet some issues when I try to start ThriftServer based a build from
master branch (I could successfully run it from the branch-1.0-jdbc
branch). Below is my build command:

./make-distribution.sh --skip-java-test -Phadoop-2.4 -Phive -Pyarn
-Dyarn.version=2.4.0 -Dhadoop.version=2.4.0 -Phive-thriftserver

And below are the printed errors:

ERROR CompositeService: Error starting services HiveServer2
org.apache.hive.service.ServiceException: Unable to connect to MetaStore!
at org.apache.hive.service.cli.CLIService.start(CLIService.java:85)
at
org.apache.hive.service.CompositeService.start(CompositeService.java:70)
at
org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:73)
at
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:71)
at
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:314)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: javax.jdo.JDOFatalUserException: Class
org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException:
org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
at
javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at
javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:275)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:304)
at
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:234)
at
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:209)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at
org.apache.hadoop.hive.metastore.RetryingRawStore.(RetryingRawStore.java:64)
at
org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:73)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:415)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:402)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:441)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:326)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:286)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:54)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at
org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4060)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:121)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:104)
at org.apache.hive.service.cli.CLIService.start(CLIService.java:82)
... 11 more
Caused by: java.lang.ClassNotFoundException:
org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at javax.jdo.JDOHelper$18.run(JDOHelper.java:2018)
at javax.jdo.JDOHelper$18.run(JDOHelper.java:2016)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.forName(JDOHelper.java:2015)
at
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1162)
... 32 more
14/08/13 13:08:48 INFO AbstractService: Service:OperationManager is stopped.
14/08/13 13:08:48 INFO AbstractService: Service:SessionManager is stopped.
14/08/13 13:08:48 INFO AbstractService: Service:CLIService is stopped.
14/08/13 13:08:48 ERROR HiveThriftServer2: Erro

Re: Spark SQL JDBC

2014-08-12 Thread Michael Armbrust

Hive pulls in a ton of dependencies that we were afraid would break
existing spark applications.  For this reason all hive submodules are
optional.


On Tue, Aug 12, 2014 at 7:43 AM, John Omernik  wrote:

> Yin helped me with that, and I appreciate the onlist followup.  A few
> questions: Why is this the case?  I guess, does building it with
> thriftserver add much more time/size to the final build? It seems that
> unless documented well, people will miss that and this situation would
> occur, why would we not just build the thrift server in? (I am not a
> programming expert, and not trying to judge the decision to have it in a
> separate profile, I would just like to understand why it'd done that way)
>
>
>
>
> On Mon, Aug 11, 2014 at 11:47 AM, Cheng Lian 
> wrote:
>
>> Hi John, the JDBC Thrift server resides in its own build profile and need
>> to be enabled explicitly by ./sbt/sbt -Phive-thriftserver assembly.
>> 
>>
>>
>> On Tue, Aug 5, 2014 at 4:54 AM, John Omernik  wrote:
>>
>>> I am using spark-1.1.0-SNAPSHOT right now and trying to get familiar
>>> with the JDBC thrift server.  I have everything compiled correctly, I can
>>> access data in spark-shell on yarn from my hive installation. Cached
>>> tables, etc all work.
>>>
>>> When I execute ./sbin/start-thriftserver.sh
>>>
>>> I get the error below. Shouldn't it just ready my spark-env? I guess I
>>> am lost on how to make this work.
>>>
>>> Thanks1
>>>
>>> $ ./start-thriftserver.sh
>>>
>>>
>>> Spark assembly has been built with Hive, including Datanucleus jars on
>>> classpath
>>>
>>> Exception in thread "main" java.lang.ClassNotFoundException:
>>> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
>>>
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>
>>> at java.lang.Class.forName0(Native Method)
>>>
>>> at java.lang.Class.forName(Class.java:270)
>>>
>>> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:311)
>>>
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)
>>>
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>
>>
>

Re: Spark SQL JDBC

2014-08-12 Thread John Omernik

Yin helped me with that, and I appreciate the onlist followup.  A few
questions: Why is this the case?  I guess, does building it with
thriftserver add much more time/size to the final build? It seems that
unless documented well, people will miss that and this situation would
occur, why would we not just build the thrift server in? (I am not a
programming expert, and not trying to judge the decision to have it in a
separate profile, I would just like to understand why it'd done that way)




On Mon, Aug 11, 2014 at 11:47 AM, Cheng Lian  wrote:

> Hi John, the JDBC Thrift server resides in its own build profile and need
> to be enabled explicitly by ./sbt/sbt -Phive-thriftserver assembly.
> 
>
>
> On Tue, Aug 5, 2014 at 4:54 AM, John Omernik  wrote:
>
>> I am using spark-1.1.0-SNAPSHOT right now and trying to get familiar with
>> the JDBC thrift server.  I have everything compiled correctly, I can access
>> data in spark-shell on yarn from my hive installation. Cached tables, etc
>> all work.
>>
>> When I execute ./sbin/start-thriftserver.sh
>>
>> I get the error below. Shouldn't it just ready my spark-env? I guess I am
>> lost on how to make this work.
>>
>> Thanks1
>>
>> $ ./start-thriftserver.sh
>>
>>
>> Spark assembly has been built with Hive, including Datanucleus jars on
>> classpath
>>
>> Exception in thread "main" java.lang.ClassNotFoundException:
>> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
>>
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>
>> at java.security.AccessController.doPrivileged(Native Method)
>>
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>
>> at java.lang.Class.forName0(Native Method)
>>
>> at java.lang.Class.forName(Class.java:270)
>>
>> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:311)
>>
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)
>>
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>
>

Re: Spark SQL JDBC

2014-08-11 Thread Cheng Lian

Hi John, the JDBC Thrift server resides in its own build profile and need
to be enabled explicitly by ./sbt/sbt -Phive-thriftserver assembly.



On Tue, Aug 5, 2014 at 4:54 AM, John Omernik  wrote:

> I am using spark-1.1.0-SNAPSHOT right now and trying to get familiar with
> the JDBC thrift server.  I have everything compiled correctly, I can access
> data in spark-shell on yarn from my hive installation. Cached tables, etc
> all work.
>
> When I execute ./sbin/start-thriftserver.sh
>
> I get the error below. Shouldn't it just ready my spark-env? I guess I am
> lost on how to make this work.
>
> Thanks1
>
> $ ./start-thriftserver.sh
>
>
> Spark assembly has been built with Hive, including Datanucleus jars on
> classpath
>
> Exception in thread "main" java.lang.ClassNotFoundException:
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>
> at java.lang.Class.forName0(Native Method)
>
> at java.lang.Class.forName(Class.java:270)
>
> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:311)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>

Spark SQL JDBC

2014-08-04 Thread John Omernik

I am using spark-1.1.0-SNAPSHOT right now and trying to get familiar with
the JDBC thrift server.  I have everything compiled correctly, I can access
data in spark-shell on yarn from my hive installation. Cached tables, etc
all work.

When I execute ./sbin/start-thriftserver.sh

I get the error below. Shouldn't it just ready my spark-env? I guess I am
lost on how to make this work.

Thanks1

$ ./start-thriftserver.sh


Spark assembly has been built with Hive, including Datanucleus jars on
classpath

Exception in thread "main" java.lang.ClassNotFoundException:
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:270)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:311)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Re: Spark SQL JDBC Connectivity

2014-07-30 Thread Michael Armbrust

Very cool.  Glad you found a solution that works.


On Wed, Jul 30, 2014 at 1:04 PM, Venkat Subramanian 
wrote:

> For the time being, we decided to take a different route. We created a Rest
> API layer in our app and allowed SQL query passing via the Rest. Internally
> we pass that query to the SparkSQL layer on the RDD and return back the
> results. With this Spark SQL is supported for our RDDs via this rest API
> now. It is easy to do this and took a just a few hours and it works for our
> use case.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p10986.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Spark SQL JDBC Connectivity

2014-07-30 Thread Venkat Subramanian

For the time being, we decided to take a different route. We created a Rest
API layer in our app and allowed SQL query passing via the Rest. Internally
we pass that query to the SparkSQL layer on the RDD and return back the
results. With this Spark SQL is supported for our RDDs via this rest API
now. It is easy to do this and took a just a few hours and it works for our
use case. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p10986.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark SQL JDBC Connectivity and more

2014-06-09 Thread Michael Armbrust

> [Venkat] Are you saying - pull in the SharkServer2 code in my standalone
>  spark application (as a part of the standalone application process), pass
> in
> the spark context of the standalone app to SharkServer2 Sparkcontext at
> startup and viola we get a SQL/JDBC interfaces for the RDDs   of the
> Standalone app that are exposed as tables? Thanks for the clarification.
>

Yeah, thats should work although it is pretty hacky and is not officially
supported.  It might be interesting to augment Shark to allow the user to
invoke custom applications using the same SQLContext.  If this is something
you'd have time to implement I'd be happy to discuss the design further.

Re: Spark SQL JDBC Connectivity and more

2014-06-09 Thread Venkat Subramanian

1) If I have a standalone spark application that has already built a RDD,
how can SharkServer2 or for that matter Shark access 'that' RDD and do
queries on it. All the examples I have seen for Shark, the RDD (tables) are
created within Shark's spark context and processed.

This is not possible out of the box with Shark.  If you look at the code for
SharkServer2 though, you'll see that its just a standard HiveContext under
the covers.  If you modify this startup code, any SchemaRDD you register as
a table in this context will be exposed over JDBC.

[Venkat] Are you saying - pull in the SharkServer2 code in my standalone
spark application (as a part of the standalone application process), pass in
the spark context of the standalone app to SharkServer2 Sparkcontext at
startup and viola we get a SQL/JDBC interfaces for the RDDs   of the
Standalone app that are exposed as tables? Thanks for the clarification.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p7264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark SQL JDBC Connectivity and more

2014-05-29 Thread Michael Armbrust

On Thu, May 29, 2014 at 3:26 PM, Venkat Subramanian 
wrote:
>
> 1) If I have a standalone spark application that has already built a RDD,
> how can SharkServer2 or for that matter Shark access 'that' RDD and do
> queries on it. All the examples I have seen for Shark, the RDD (tables) are
> created within Shark's spark context and processed.
>

This is not possible out of the box with Shark.  If you look at the code
for SharkServer2 though, you'll see that its just a standard HiveContext
under the covers.  If you modify this startup code, any SchemaRDD you
register as a table in this context will be exposed over JDBC.

2) I have two applications, one used for processing and computing output RDD
> from an input and another for post processing the resultant RDD into
> multiple persistent stores + doing other things with it.  These are split
> in
> to separate processes intentionally. How do we share the output RDD from
> first application to second application without writing to disk (thinking
> of
> serializing the RDD and streaming through Kafka, but then we loose time and
> all the fault tolerance that RDD brings in)? Is Tachyon the only other way?
> Are there other models/design patterns for applications that share RDDs, as
> this may be a very common use case?
>

Yeah, I think Tachyon is the best way to share RDDs between Spark Contexts.

Re: Spark SQL JDBC Connectivity and more

2014-05-29 Thread Venkat Subramanian

Thanks Michael.
OK will try SharkServer2..

But I have some basic questions on a related area:

1) If I have a standalone spark application that has already built a RDD,
how can SharkServer2 or for that matter Shark access 'that' RDD and do
queries on it. All the examples I have seen for Shark, the RDD (tables) are
created within Shark's spark context and processed.

I have stylized the real problem we have which is, "we have a standalone
spark application that is processing DStreams and producing output Dstreams.
I want to expose that near real-time Dstream data to a 3 rd party app via
JDBC and allow SharkServer2 CLI to operate and query on the Dstreams
real-time all from memory". Currently we are writing output stream to
Cassandra and exposing it to 3 rd party app through it via JDBC, but want to
avoid that extra disk write which increases latency.

2) I have two applications, one used for processing and computing output RDD
from an input and another for post processing the resultant RDD into
multiple persistent stores + doing other things with it. These are split in
to separate processes intentionally. How do we share the output RDD from
first application to second application without writing to disk (thinking of
serializing the RDD and streaming through Kafka, but then we loose time and
all the fault tolerance that RDD brings in)? Is Tachyon the only other way?
Are there other models/design patterns for applications that share RDDs, as
this may be a very common use case?

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p6543.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark SQL JDBC Connectivity

2014-05-29 Thread Michael Armbrust

On Wed, May 28, 2014 at 11:39 PM, Venkat Subramanian wrote:

> We are planning to use the latest Spark SQL on RDDs. If a third party
> application wants to connect to Spark via JDBC, does Spark SQL have
> support?
> (We want to avoid going though Shark/Hive JDBC layer as we need good
> performance).
>

 We don't have a full release yet, but there is a branch on the Shark
github repository that has a version of SharkServer2 that uses Spark SQL.
 We also plan to port the Shark CLI, but this is not yet finished.  You can
find this branch along with documentation here:
https://github.com/amplab/shark/tree/sparkSql

Note that this version has not yet received much testing (outside of the
integration tests that are run on Spark SQL).  That said, I would love for
people to test it out and report any problems or missing features.  Any
help here would be greatly appreciated!

> BTW, we also want to do the same for Spark Streaming - With Spark SQL work
> on DStreams (since the underlying structure is RDD anyway) and can we
> expose
> the streaming DStream RDD through JDBC via Spark SQL for Realtime
> analytics.
>

 We have talked about doing this, but this is not currently on the near
term road map.

Spark SQL JDBC Connectivity

2014-05-28 Thread Venkat Subramanian

We are planning to use the latest Spark SQL on RDDs. If a third party
application wants to connect to Spark via JDBC, does Spark SQL have support?
(We want to avoid going though Shark/Hive JDBC layer as we need good
performance).

BTW, we also want to do the same for Spark Streaming - With Spark SQL work
on DStreams (since the underlying structure is RDD anyway) and can we expose
the streaming DStream RDD through JDBC via Spark SQL for Realtime analytics.

Any pointers on this will greatly help.

Regards,

Venkat



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

64 matches

Mail list logo