Thanks , will give it a try, appreciate your help Regards, Gaurav On Sep 23, 2014 1:52 PM, "Michael Armbrust" <mich...@databricks.com> wrote:
> A workaround for now would be to save the JSON as parquet and the create a > metastore parquet table. Using parquet will be much faster for repeated > querying. This function might be helpful: > > import org.apache.spark.sql.hive.HiveMetastoreTypes > > def createParquetTable(name: String, file: String, sqlContext: > SQLContext): Unit = { > import sqlContext._ > > val rdd = parquetFile(file) > val schema = rdd.schema.fields.map(f => s"${f.name} > ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n") > val ddl = s""" > |CREATE EXTERNAL TABLE $name ( > | $schema > |) > |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' > |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat' > |LOCATION '$file'""".stripMargin > sql(ddl) > setConf("spark.sql.hive.convertMetastoreParquet", "true") > } > > On Tue, Sep 23, 2014 at 10:49 AM, Michael Armbrust <mich...@databricks.com > > wrote: > >> You can't directly query JSON tables from the CLI or JDBC server since >> temporary tables only live for the life of the Spark Context. This PR will >> eventually (targeted for 1.2) let you do what you want in pure SQL: >> https://github.com/apache/spark/pull/2475 >> >> On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai <huaiyin....@gmail.com> wrote: >> >>> Hi Gaurav, >>> >>> Seems metastore should be created by LocalHiveContext and metastore_db >>> should be created by a regular HiveContext. Can you check if you are still >>> using LocalHiveContext when you tried to access your tables? Also, if you >>> created those tables when you launched your sql cli under bin/, you can >>> launch sql cli in the same dir (bin/) and spark sql should be able to >>> connect to the metastore without any setting. >>> >>> btw, Can you let me know your settings in hive-site? >>> >>> Thanks, >>> >>> Yin >>> >>> On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari <gtins...@gmail.com> >>> wrote: >>> >>>> Hi , >>>> >>>> I tried setting the metastore and metastore_db location in the >>>> *conf/hive-site.xml *to the directories created in spark bin folder >>>> (they were created when I ran spark shell and used LocalHiveContext), but >>>> still doesn't work >>>> >>>> Do I need to same my RDD as a table through hive context to make this >>>> work? >>>> >>>> Regards, >>>> Gaurav >>>> >>>> On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai <huaiyin....@gmail.com> >>>> wrote: >>>> >>>>> Hi Gaurav, >>>>> >>>>> Can you put hive-site.xml in conf/ and try again? >>>>> >>>>> Thanks, >>>>> >>>>> Yin >>>>> >>>>> On Mon, Sep 22, 2014 at 4:02 PM, gtinside <gtins...@gmail.com> wrote: >>>>> >>>>>> Hi , >>>>>> >>>>>> I have been using spark shell to execute all SQLs. I am connecting to >>>>>> Cassandra , converting the data in JSON and then running queries on >>>>>> it, I >>>>>> am using HiveContext (and not SQLContext) because of "explode " >>>>>> functionality in it. >>>>>> >>>>>> I want to see how can I use Spark SQL CLI for directly running the >>>>>> queries >>>>>> on saved table. I see metastore and metastore_db getting created in >>>>>> the >>>>>> spark bin directory (my hive context is LocalHiveContext). I tried >>>>>> executing >>>>>> queries in spark-sql cli after putting in a hive-site.xml with >>>>>> metastore and >>>>>> metastore db directory same as the one in spark bin, but it doesn't >>>>>> seem to >>>>>> be working. I am getting >>>>>> "org.apache.hadoop.hive.ql.metadata.HiveException: >>>>>> Unable to fetch table test_tbl". >>>>>> >>>>>> Is this possible ? >>>>>> >>>>>> Regards, >>>>>> Gaurav >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html >>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>> >>>>>> >>>>> >>>> >>> >> >