Re: Spark SQL CLI

Gaurav Tiwari Wed, 24 Sep 2014 16:48:07 -0700

Thanks , will give it a try, appreciate your help

Regards,
Gaurav
On Sep 23, 2014 1:52 PM, "Michael Armbrust" <mich...@databricks.com> wrote:


> A workaround for now would be to save the JSON as parquet and the create a
> metastore parquet table.  Using parquet will be much faster for repeated
> querying. This function might be helpful:
>
> import org.apache.spark.sql.hive.HiveMetastoreTypes
>
> def createParquetTable(name: String, file: String, sqlContext:
> SQLContext): Unit = {
>   import sqlContext._
>
>   val rdd = parquetFile(file)
>   val schema = rdd.schema.fields.map(f => s"${f.name}
> ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n")
>   val ddl = s"""
>     |CREATE EXTERNAL TABLE $name (
>     |  $schema
>     |)
>     |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>     |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
>     |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
>     |LOCATION '$file'""".stripMargin
>   sql(ddl)
>   setConf("spark.sql.hive.convertMetastoreParquet", "true")
> }
>
> On Tue, Sep 23, 2014 at 10:49 AM, Michael Armbrust <mich...@databricks.com
> > wrote:
>
>> You can't directly query JSON tables from the CLI or JDBC server since
>> temporary tables only live for the life of the Spark Context.  This PR will
>> eventually (targeted for 1.2) let you do what you want in pure SQL:
>> https://github.com/apache/spark/pull/2475
>>
>> On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai <huaiyin....@gmail.com> wrote:
>>
>>> Hi Gaurav,
>>>
>>> Seems metastore should be created by LocalHiveContext and metastore_db
>>> should be created by a regular HiveContext. Can you check if you are still
>>> using LocalHiveContext when you tried to access your tables? Also, if you
>>> created those tables when you launched your sql cli under bin/, you can
>>> launch sql cli in the same dir (bin/) and spark sql should be able to
>>> connect to the metastore without any setting.
>>>
>>> btw, Can you let me know your settings in hive-site?
>>>
>>> Thanks,
>>>
>>> Yin
>>>
>>> On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari <gtins...@gmail.com>
>>> wrote:
>>>
>>>> Hi ,
>>>>
>>>> I tried setting the metastore and metastore_db location in the
>>>> *conf/hive-site.xml *to the directories created in spark bin folder
>>>> (they were created when I ran spark shell and used LocalHiveContext), but
>>>> still doesn't work
>>>>
>>>> Do I need to same my RDD as a table through hive context to make this
>>>> work?
>>>>
>>>> Regards,
>>>> Gaurav
>>>>
>>>> On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai <huaiyin....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gaurav,
>>>>>
>>>>> Can you put hive-site.xml in conf/ and try again?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yin
>>>>>
>>>>> On Mon, Sep 22, 2014 at 4:02 PM, gtinside <gtins...@gmail.com> wrote:
>>>>>
>>>>>> Hi ,
>>>>>>
>>>>>> I have been using spark shell to execute all SQLs. I am connecting to
>>>>>> Cassandra , converting the data in JSON and then running queries on
>>>>>> it,  I
>>>>>> am using HiveContext (and not SQLContext) because of "explode "
>>>>>> functionality in it.
>>>>>>
>>>>>> I want to see how can I use Spark SQL CLI for directly running the
>>>>>> queries
>>>>>> on saved table. I see metastore and metastore_db getting created in
>>>>>> the
>>>>>> spark bin directory (my hive context is LocalHiveContext). I tried
>>>>>> executing
>>>>>> queries in spark-sql cli after putting in a hive-site.xml with
>>>>>> metastore and
>>>>>> metastore db directory same as the one in spark bin,  but it doesn't
>>>>>> seem to
>>>>>> be working. I am getting
>>>>>> "org.apache.hadoop.hive.ql.metadata.HiveException:
>>>>>> Unable to fetch table test_tbl".
>>>>>>
>>>>>> Is this possible ?
>>>>>>
>>>>>> Regards,
>>>>>> Gaurav
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark SQL CLI

Reply via email to