I'll add that there is an experimental method that allows you to start the JDBC server with an existing HiveContext (which might have registered temporary tables).
https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42 On Thu, Dec 11, 2014 at 6:52 AM, Denny Lee <denny.g....@gmail.com> wrote: > > Yes, that is correct. A quick reference on this is the post > https://www.linkedin.com/pulse/20141007143323-732459-an-absolutely-unofficial-way-to-connect-tableau-to-sparksql-spark-1-1?_mSplash=1 > with the pertinent section being: > > It is important to note that when you create Spark tables (for example, > via the .registerTempTable) these are operating within the Spark > environment which resides in a separate process than the Hive Metastore. > This means that currently tables that are created within the Spark context > are not available through the Thrift server. To achieve this, within the > Spark context save your temporary table into Hive - then the Spark Thrift > Server will be able to see the table. > > HTH! > > > On Thu, Dec 11, 2014 at 04:09 Anas Mosaad <anas.mos...@incorta.com> wrote: > >> Actually I came to a conclusion that RDDs has to be persisted in hive in >> order to be able to access through thrift. >> Hope I didn't end up with incorrect conclusion. >> Please someone correct me if I am wrong. >> On Dec 11, 2014 8:53 AM, "Judy Nash" <judyn...@exchange.microsoft.com> >> wrote: >> >>> Looks like you are wondering why you cannot see the RDD table you have >>> created via thrift? >>> >>> >>> >>> Based on my own experience with spark 1.1, RDD created directly via >>> Spark SQL (i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift, >>> since thrift has its own session containing its own RDD. >>> >>> Spark SQL experts on the forum can confirm on this though. >>> >>> >>> >>> *From:* Cheng Lian [mailto:lian.cs....@gmail.com] >>> *Sent:* Tuesday, December 9, 2014 6:42 AM >>> *To:* Anas Mosaad >>> *Cc:* Judy Nash; user@spark.apache.org >>> *Subject:* Re: Spark-SQL JDBC driver >>> >>> >>> >>> According to the stacktrace, you were still using SQLContext rather than >>> HiveContext. To interact with Hive, HiveContext *must* be used. >>> >>> Please refer to this page >>> http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables >>> >>> On 12/9/14 6:26 PM, Anas Mosaad wrote: >>> >>> Back to the first question, this will mandate that hive is up and >>> running? >>> >>> >>> >>> When I try it, I get the following exception. The documentation says >>> that this method works only on SchemaRDD. I though that >>> countries.saveAsTable did not work for that a reason so I created a tmp >>> that contains the results from the registered temp table. Which I could >>> validate that it's a SchemaRDD as shown below. >>> >>> >>> >>> >>> * @Judy,* I do really appreciate your kind support and I want to >>> understand and off course don't want to wast your time. If you can direct >>> me the documentation describing this details, this will be great. >>> >>> >>> >>> scala> val tmp = sqlContext.sql("select * from countries") >>> >>> tmp: org.apache.spark.sql.SchemaRDD = >>> >>> SchemaRDD[12] at RDD at SchemaRDD.scala:108 >>> >>> == Query Plan == >>> >>> == Physical Plan == >>> >>> PhysicalRDD >>> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29], >>> MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36 >>> >>> >>> >>> scala> tmp.saveAsTable("Countries") >>> >>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: >>> Unresolved plan found, tree: >>> >>> 'CreateTableAsSelect None, Countries, false, None >>> >>> Project >>> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29] >>> >>> Subquery countries >>> >>> LogicalRDD >>> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29], >>> MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36 >>> >>> >>> >>> at >>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83) >>> >>> at >>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78) >>> >>> at >>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) >>> >>> at >>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) >>> >>> at >>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78) >>> >>> at >>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76) >>> >>> at >>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) >>> >>> at >>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) >>> >>> at >>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) >>> >>> at >>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) >>> >>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) >>> >>> at >>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) >>> >>> at >>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) >>> >>> at scala.collection.immutable.List.foreach(List.scala:318) >>> >>> at >>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) >>> >>> at >>> org.apache.spark.sql.SchemaRDDLike$class.saveAsTable(SchemaRDDLike.scala:126) >>> >>> at org.apache.spark.sql.SchemaRDD.saveAsTable(SchemaRDD.scala:108) >>> >>> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22) >>> >>> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:27) >>> >>> at $iwC$$iwC$$iwC.<init>(<console>:29) >>> >>> at $iwC$$iwC.<init>(<console>:31) >>> >>> at $iwC.<init>(<console>:33) >>> >>> at <init>(<console>:35) >>> >>> at .<init>(<console>:39) >>> >>> at .<clinit>(<console>) >>> >>> at .<init>(<console>:7) >>> >>> at .<clinit>(<console>) >>> >>> at $print(<console>) >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> >>> at >>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) >>> >>> at >>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) >>> >>> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) >>> >>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) >>> >>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) >>> >>> at >>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) >>> >>> at >>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) >>> >>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) >>> >>> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) >>> >>> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) >>> >>> at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) >>> >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968) >>> >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) >>> >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) >>> >>> at >>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >>> >>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916) >>> >>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011) >>> >>> at org.apache.spark.repl.Main$.main(Main.scala:31) >>> >>> at org.apache.spark.repl.Main.main(Main.scala) >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> >>> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:365) >>> >>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) >>> >>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Dec 9, 2014 at 11:44 AM, Cheng Lian <lian.cs....@gmail.com> >>> wrote: >>> >>> How did you register the table under spark-shell? Two things to notice: >>> >>> 1. To interact with Hive, HiveContext instead of SQLContext must be used. >>> 2. `registerTempTable` doesn't persist the table into Hive metastore, >>> and the table is lost after quitting spark-shell. Instead, you must use >>> `saveAsTable`. >>> >>> >>> >>> On 12/9/14 5:27 PM, Anas Mosaad wrote: >>> >>> Thanks Cheng, >>> >>> >>> >>> I thought spark-sql is using the same exact metastore, right? However, >>> it didn't work as expected. Here's what I did. >>> >>> >>> >>> In spark-shell, I loaded a csv files and registered the table, say >>> countries. >>> >>> Started the thrift server. >>> >>> Connected using beeline. When I run show tables or !tables, I get empty >>> list of tables as follow: >>> >>> *0: jdbc:hive2://localhost:10000> !tables* >>> >>> *+------------+--------------+-------------+-------------+----------+* >>> >>> *| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS |* >>> >>> *+------------+--------------+-------------+-------------+----------+* >>> >>> *+------------+--------------+-------------+-------------+----------+* >>> >>> *0: jdbc:hive2://localhost:10000> show tables ;* >>> >>> *+---------+* >>> >>> *| result |* >>> >>> *+---------+* >>> >>> *+---------+* >>> >>> *No rows selected (0.106 seconds)* >>> >>> *0: jdbc:hive2://localhost:10000> * >>> >>> >>> >>> >>> >>> Kindly advice, what am I missing? I want to read the RDD using SQL from >>> outside spark-shell (i.e. like any other relational database) >>> >>> >>> >>> >>> >>> On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian <lian.cs....@gmail.com> >>> wrote: >>> >>> Essentially, the Spark SQL JDBC Thrift server is just a Spark port of >>> HiveServer2. You don't need to run Hive, but you do need a working >>> Metastore. >>> >>> >>> >>> On 12/9/14 3:59 PM, Anas Mosaad wrote: >>> >>> Thanks Judy, this is exactly what I'm looking for. However, and plz >>> forgive me if it's a dump question is: It seems to me that thrift is the >>> same as hive2 JDBC driver, does this mean that starting thrift will start >>> hive as well on the server? >>> >>> >>> >>> On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash < >>> judyn...@exchange.microsoft.com> wrote: >>> >>> You can use thrift server for this purpose then test it with beeline. >>> >>> >>> >>> See doc: >>> >>> >>> https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server >>> >>> >>> >>> >>> >>> *From:* Anas Mosaad [mailto:anas.mos...@incorta.com] >>> *Sent:* Monday, December 8, 2014 11:01 AM >>> *To:* user@spark.apache.org >>> *Subject:* Spark-SQL JDBC driver >>> >>> >>> >>> Hello Everyone, >>> >>> >>> >>> I'm brand new to spark and was wondering if there's a JDBC driver to >>> access spark-SQL directly. I'm running spark in standalone mode and don't >>> have hadoop in this environment. >>> >>> >>> >>> -- >>> >>> >>> >>> *Best Regards/أطيب المنى,* >>> >>> >>> >>> *Anas Mosaad* >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> >>> >>> *Best Regards/أطيب المنى,* >>> >>> >>> >>> *Anas Mosaad* >>> >>> *Incorta Inc.* >>> >>> *+20-100-743-4510 <%2B20-100-743-4510>* >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> >>> >>> *Best Regards/أطيب المنى,* >>> >>> >>> >>> *Anas Mosaad* >>> >>> *Incorta Inc.* >>> >>> *+20-100-743-4510 <%2B20-100-743-4510>* >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> >>> >>> *Best Regards/أطيب المنى,* >>> >>> >>> >>> *Anas Mosaad* >>> >>> *Incorta Inc.* >>> >>> *+20-100-743-4510 <%2B20-100-743-4510>* >>> >>> >>> >>