Hi Arush, So yes I want to create the tables through Spark SQL. I have placed the hive-site.xml file inside of the $SPARK_HOME/conf directory I thought that was all I should need to do to have the thriftserver use it. Perhaps my hive-site.xml is worng, it currently looks like this:
<configuration> <property> <name>hive.metastore.uris</name> <!-- Ensure that the following statement points to the Hive Metastore URI in your cluster --> <value>thrift://sandbox.hortonworks.com:9083</value> <description>URI for client to contact metastore server</description> </property> </configuration> Which leads me to believe it is going to pull form the thriftserver from Horton? I will go look at the docs to see if this is right, it is what Horton says to do. Do you have an example hive-site.xml by chance that works with Spark SQL? I am using 8.3 of tableau with the SparkSQL Connector. Thanks for the assistance. -Todd On Wed, Feb 11, 2015 at 2:34 AM, Arush Kharbanda <ar...@sigmoidanalytics.com > wrote: > BTW what tableau connector are you using? > > On Wed, Feb 11, 2015 at 12:55 PM, Arush Kharbanda < > ar...@sigmoidanalytics.com> wrote: > >> I am a little confused here, why do you want to create the tables in >> hive. You want to create the tables in spark-sql, right? >> >> If you are not able to find the same tables through tableau then thrift >> is connecting to a diffrent metastore than your spark-shell. >> >> One way to specify a metstore to thrift is to provide the path to >> hive-site.xml while starting thrift using --files hive-site.xml. >> >> similarly you can specify the same metastore to your spark-submit or >> sharp-shell using the same option. >> >> >> >> On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist <tsind...@gmail.com> wrote: >> >>> Arush, >>> >>> As for #2 do you mean something like this from the docs: >>> >>> // sc is an existing SparkContext.val sqlContext = new >>> org.apache.spark.sql.hive.HiveContext(sc) >>> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value >>> STRING)")sqlContext.sql("LOAD DATA LOCAL INPATH >>> 'examples/src/main/resources/kv1.txt' INTO TABLE src") >>> // Queries are expressed in HiveQLsqlContext.sql("FROM src SELECT key, >>> value").collect().foreach(println) >>> >>> Or did you have something else in mind? >>> >>> -Todd >>> >>> >>> On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist <tsind...@gmail.com> wrote: >>> >>>> Arush, >>>> >>>> Thank you will take a look at that approach in the morning. I sort of >>>> figured the answer to #1 was NO and that I would need to do 2 and 3 thanks >>>> for clarifying it for me. >>>> >>>> -Todd >>>> >>>> On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda < >>>> ar...@sigmoidanalytics.com> wrote: >>>> >>>>> 1. Can the connector fetch or query schemaRDD's saved to Parquet or >>>>> JSON files? NO >>>>> 2. Do I need to do something to expose these via hive / metastore >>>>> other than creating a table in hive? Create a table in spark sql to expose >>>>> via spark sql >>>>> 3. Does the thriftserver need to be configured to expose these in >>>>> some fashion, sort of related to question 2 you would need to configure >>>>> thrift to read from the metastore you expect it read from - by default it >>>>> reads from metastore_db directory present in the directory used to launch >>>>> the thrift server. >>>>> On 11 Feb 2015 01:35, "Todd Nist" <tsind...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm trying to understand how and what the Tableau connector to >>>>>> SparkSQL is able to access. My understanding is it needs to connect to >>>>>> the >>>>>> thriftserver and I am not sure how or if it exposes parquet, json, >>>>>> schemaRDDs, or does it only expose schemas defined in the metastore / >>>>>> hive. >>>>>> >>>>>> >>>>>> For example, I do the following from the spark-shell which generates >>>>>> a schemaRDD from a csv file and saves it as a JSON file as well as a >>>>>> parquet file. >>>>>> >>>>>> import *org.apache.sql.SQLContext >>>>>> *import com.databricks.spark.csv._ >>>>>> val sqlContext = new SQLContext(sc) >>>>>> val test = >>>>>> sqlContext.csfFile("/data/test.csv")test.toJSON().saveAsTextFile("/data/out") >>>>>> test.saveAsParquetFile("/data/out") >>>>>> >>>>>> When I connect from Tableau, the only thing I see is the "default" >>>>>> schema and nothing in the tables section. >>>>>> >>>>>> So my questions are: >>>>>> >>>>>> 1. Can the connector fetch or query schemaRDD's saved to Parquet or >>>>>> JSON files? >>>>>> 2. Do I need to do something to expose these via hive / metastore >>>>>> other than creating a table in hive? >>>>>> 3. Does the thriftserver need to be configured to expose these in >>>>>> some fashion, sort of related to question 2. >>>>>> >>>>>> TIA for the assistance. >>>>>> >>>>>> -Todd >>>>>> >>>>> >>>> >>> >> >> >> -- >> >> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com> >> >> *Arush Kharbanda* || Technical Teamlead >> >> ar...@sigmoidanalytics.com || www.sigmoidanalytics.com >> > > > > -- > > [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com> > > *Arush Kharbanda* || Technical Teamlead > > ar...@sigmoidanalytics.com || www.sigmoidanalytics.com >