A workaround for now would be to save the JSON as parquet and the create a metastore parquet table. Using parquet will be much faster for repeated querying. This function might be helpful:
import org.apache.spark.sql.hive.HiveMetastoreTypes def createParquetTable(name: String, file: String, sqlContext: SQLContext): Unit = { import sqlContext._ val rdd = parquetFile(file) val schema = rdd.schema.fields.map(f => s"${f.name} ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n") val ddl = s""" |CREATE EXTERNAL TABLE $name ( | $schema |) |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat' |LOCATION '$file'""".stripMargin sql(ddl) setConf("spark.sql.hive.convertMetastoreParquet", "true") } On Tue, Sep 23, 2014 at 10:49 AM, Michael Armbrust <mich...@databricks.com> wrote: > You can't directly query JSON tables from the CLI or JDBC server since > temporary tables only live for the life of the Spark Context. This PR will > eventually (targeted for 1.2) let you do what you want in pure SQL: > https://github.com/apache/spark/pull/2475 > > On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai <huaiyin....@gmail.com> wrote: > >> Hi Gaurav, >> >> Seems metastore should be created by LocalHiveContext and metastore_db >> should be created by a regular HiveContext. Can you check if you are still >> using LocalHiveContext when you tried to access your tables? Also, if you >> created those tables when you launched your sql cli under bin/, you can >> launch sql cli in the same dir (bin/) and spark sql should be able to >> connect to the metastore without any setting. >> >> btw, Can you let me know your settings in hive-site? >> >> Thanks, >> >> Yin >> >> On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari <gtins...@gmail.com> >> wrote: >> >>> Hi , >>> >>> I tried setting the metastore and metastore_db location in the >>> *conf/hive-site.xml *to the directories created in spark bin folder >>> (they were created when I ran spark shell and used LocalHiveContext), but >>> still doesn't work >>> >>> Do I need to same my RDD as a table through hive context to make this >>> work? >>> >>> Regards, >>> Gaurav >>> >>> On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai <huaiyin....@gmail.com> wrote: >>> >>>> Hi Gaurav, >>>> >>>> Can you put hive-site.xml in conf/ and try again? >>>> >>>> Thanks, >>>> >>>> Yin >>>> >>>> On Mon, Sep 22, 2014 at 4:02 PM, gtinside <gtins...@gmail.com> wrote: >>>> >>>>> Hi , >>>>> >>>>> I have been using spark shell to execute all SQLs. I am connecting to >>>>> Cassandra , converting the data in JSON and then running queries on >>>>> it, I >>>>> am using HiveContext (and not SQLContext) because of "explode " >>>>> functionality in it. >>>>> >>>>> I want to see how can I use Spark SQL CLI for directly running the >>>>> queries >>>>> on saved table. I see metastore and metastore_db getting created in the >>>>> spark bin directory (my hive context is LocalHiveContext). I tried >>>>> executing >>>>> queries in spark-sql cli after putting in a hive-site.xml with >>>>> metastore and >>>>> metastore db directory same as the one in spark bin, but it doesn't >>>>> seem to >>>>> be working. I am getting >>>>> "org.apache.hadoop.hive.ql.metadata.HiveException: >>>>> Unable to fetch table test_tbl". >>>>> >>>>> Is this possible ? >>>>> >>>>> Regards, >>>>> Gaurav >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>> >>> >> >