Re: Re: spark sql cli query results written to file ?

2015-12-03 Thread fightf...@163.com
Well , Sorry for late reponse and thanks a lot for pointing out the clue.  



fightf...@163.com
 
From: Akhil Das
Date: 2015-12-03 14:50
To: Sahil Sareen
CC: fightf...@163.com; user
Subject: Re: spark sql cli query results written to file ?
Oops 3 mins late. :)

Thanks
Best Regards

On Thu, Dec 3, 2015 at 11:49 AM, Sahil Sareen <sareen...@gmail.com> wrote:
Yeah, Thats the example from the link I just posted.

-Sahil

On Thu, Dec 3, 2015 at 11:41 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote:
Something like this?

val df = sqlContext.read.load("examples/src/main/resources/users.parquet")
df.select("name", "favorite_color").write.save("namesAndFavColors.parquet")

It will save the name, favorite_color columns to a parquet file. You can read 
more information over here 
http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes



Thanks
Best Regards

On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.com <fightf...@163.com> wrote:
HI,
How could I save the spark sql cli running queries results and write the 
results to some local file ? 
Is there any available command ? 

Thanks,
Sun.



fightf...@163.com





Re: spark sql cli query results written to file ?

2015-12-02 Thread Sahil Sareen
Yeah, Thats the example from the link I just posted.

-Sahil

On Thu, Dec 3, 2015 at 11:41 AM, Akhil Das 
wrote:

> Something like this?
>
> val df = 
> sqlContext.read.load("examples/src/main/resources/users.parquet")df.select("name",
>  "favorite_color").write.save("namesAndFavColors.parquet")
>
>
> It will save the name, favorite_color columns to a parquet file. You can
> read more information over here
> http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes
>
>
>
> Thanks
> Best Regards
>
> On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.com 
> wrote:
>
>> HI,
>> How could I save the spark sql cli running queries results and write the
>> results to some local file ?
>> Is there any available command ?
>>
>> Thanks,
>> Sun.
>>
>> --
>> fightf...@163.com
>>
>
>


Re: spark sql cli query results written to file ?

2015-12-02 Thread Akhil Das
Something like this?

val df = 
sqlContext.read.load("examples/src/main/resources/users.parquet")df.select("name",
"favorite_color").write.save("namesAndFavColors.parquet")


It will save the name, favorite_color columns to a parquet file. You can
read more information over here
http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes



Thanks
Best Regards

On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.com 
wrote:

> HI,
> How could I save the spark sql cli running queries results and write the
> results to some local file ?
> Is there any available command ?
>
> Thanks,
> Sun.
>
> --
> fightf...@163.com
>


Re: spark sql cli query results written to file ?

2015-12-02 Thread Sahil Sareen
Did you see: http://spark.apache.org/docs/latest/sql-programming-guide.html

-Sahil

On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.com 
wrote:

> HI,
> How could I save the spark sql cli running queries results and write the
> results to some local file ?
> Is there any available command ?
>
> Thanks,
> Sun.
>
> --
> fightf...@163.com
>


Re: spark sql cli query results written to file ?

2015-12-02 Thread Akhil Das
Oops 3 mins late. :)

Thanks
Best Regards

On Thu, Dec 3, 2015 at 11:49 AM, Sahil Sareen  wrote:

> Yeah, Thats the example from the link I just posted.
>
> -Sahil
>
> On Thu, Dec 3, 2015 at 11:41 AM, Akhil Das 
> wrote:
>
>> Something like this?
>>
>> val df = 
>> sqlContext.read.load("examples/src/main/resources/users.parquet")df.select("name",
>>  "favorite_color").write.save("namesAndFavColors.parquet")
>>
>>
>> It will save the name, favorite_color columns to a parquet file. You can
>> read more information over here
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes
>>
>>
>>
>> Thanks
>> Best Regards
>>
>> On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.com 
>> wrote:
>>
>>> HI,
>>> How could I save the spark sql cli running queries results and write the
>>> results to some local file ?
>>> Is there any available command ?
>>>
>>> Thanks,
>>> Sun.
>>>
>>> --
>>> fightf...@163.com
>>>
>>
>>
>


Re: Spark SQL CLI

2014-09-24 Thread Gaurav Tiwari
Thanks , will give it a try, appreciate your help

Regards,
Gaurav
On Sep 23, 2014 1:52 PM, Michael Armbrust mich...@databricks.com wrote:

 A workaround for now would be to save the JSON as parquet and the create a
 metastore parquet table.  Using parquet will be much faster for repeated
 querying. This function might be helpful:

 import org.apache.spark.sql.hive.HiveMetastoreTypes

 def createParquetTable(name: String, file: String, sqlContext:
 SQLContext): Unit = {
   import sqlContext._

   val rdd = parquetFile(file)
   val schema = rdd.schema.fields.map(f = s${f.name}
 ${HiveMetastoreTypes.toMetastoreType(f.dataType)}).mkString(,\n)
   val ddl = s
 |CREATE EXTERNAL TABLE $name (
 |  $schema
 |)
 |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
 |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
 |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
 |LOCATION '$file'.stripMargin
   sql(ddl)
   setConf(spark.sql.hive.convertMetastoreParquet, true)
 }

 On Tue, Sep 23, 2014 at 10:49 AM, Michael Armbrust mich...@databricks.com
  wrote:

 You can't directly query JSON tables from the CLI or JDBC server since
 temporary tables only live for the life of the Spark Context.  This PR will
 eventually (targeted for 1.2) let you do what you want in pure SQL:
 https://github.com/apache/spark/pull/2475

 On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai huaiyin@gmail.com wrote:

 Hi Gaurav,

 Seems metastore should be created by LocalHiveContext and metastore_db
 should be created by a regular HiveContext. Can you check if you are still
 using LocalHiveContext when you tried to access your tables? Also, if you
 created those tables when you launched your sql cli under bin/, you can
 launch sql cli in the same dir (bin/) and spark sql should be able to
 connect to the metastore without any setting.

 btw, Can you let me know your settings in hive-site?

 Thanks,

 Yin

 On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari gtins...@gmail.com
 wrote:

 Hi ,

 I tried setting the metastore and metastore_db location in the
 *conf/hive-site.xml *to the directories created in spark bin folder
 (they were created when I ran spark shell and used LocalHiveContext), but
 still doesn't work

 Do I need to same my RDD as a table through hive context to make this
 work?

 Regards,
 Gaurav

 On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai huaiyin@gmail.com
 wrote:

 Hi Gaurav,

 Can you put hive-site.xml in conf/ and try again?

 Thanks,

 Yin

 On Mon, Sep 22, 2014 at 4:02 PM, gtinside gtins...@gmail.com wrote:

 Hi ,

 I have been using spark shell to execute all SQLs. I am connecting to
 Cassandra , converting the data in JSON and then running queries on
 it,  I
 am using HiveContext (and not SQLContext) because of explode 
 functionality in it.

 I want to see how can I use Spark SQL CLI for directly running the
 queries
 on saved table. I see metastore and metastore_db getting created in
 the
 spark bin directory (my hive context is LocalHiveContext). I tried
 executing
 queries in spark-sql cli after putting in a hive-site.xml with
 metastore and
 metastore db directory same as the one in spark bin,  but it doesn't
 seem to
 be working. I am getting
 org.apache.hadoop.hive.ql.metadata.HiveException:
 Unable to fetch table test_tbl.

 Is this possible ?

 Regards,
 Gaurav



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
 Sent from the Apache Spark User List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org









Re: Spark SQL CLI

2014-09-23 Thread Michael Armbrust
You can't directly query JSON tables from the CLI or JDBC server since
temporary tables only live for the life of the Spark Context.  This PR will
eventually (targeted for 1.2) let you do what you want in pure SQL:
https://github.com/apache/spark/pull/2475

On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai huaiyin@gmail.com wrote:

 Hi Gaurav,

 Seems metastore should be created by LocalHiveContext and metastore_db
 should be created by a regular HiveContext. Can you check if you are still
 using LocalHiveContext when you tried to access your tables? Also, if you
 created those tables when you launched your sql cli under bin/, you can
 launch sql cli in the same dir (bin/) and spark sql should be able to
 connect to the metastore without any setting.

 btw, Can you let me know your settings in hive-site?

 Thanks,

 Yin

 On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari gtins...@gmail.com wrote:

 Hi ,

 I tried setting the metastore and metastore_db location in the
 *conf/hive-site.xml *to the directories created in spark bin folder
 (they were created when I ran spark shell and used LocalHiveContext), but
 still doesn't work

 Do I need to same my RDD as a table through hive context to make this
 work?

 Regards,
 Gaurav

 On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai huaiyin@gmail.com wrote:

 Hi Gaurav,

 Can you put hive-site.xml in conf/ and try again?

 Thanks,

 Yin

 On Mon, Sep 22, 2014 at 4:02 PM, gtinside gtins...@gmail.com wrote:

 Hi ,

 I have been using spark shell to execute all SQLs. I am connecting to
 Cassandra , converting the data in JSON and then running queries on
 it,  I
 am using HiveContext (and not SQLContext) because of explode 
 functionality in it.

 I want to see how can I use Spark SQL CLI for directly running the
 queries
 on saved table. I see metastore and metastore_db getting created in the
 spark bin directory (my hive context is LocalHiveContext). I tried
 executing
 queries in spark-sql cli after putting in a hive-site.xml with
 metastore and
 metastore db directory same as the one in spark bin,  but it doesn't
 seem to
 be working. I am getting
 org.apache.hadoop.hive.ql.metadata.HiveException:
 Unable to fetch table test_tbl.

 Is this possible ?

 Regards,
 Gaurav



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org







Re: Spark SQL CLI

2014-09-23 Thread Michael Armbrust
A workaround for now would be to save the JSON as parquet and the create a
metastore parquet table.  Using parquet will be much faster for repeated
querying. This function might be helpful:

import org.apache.spark.sql.hive.HiveMetastoreTypes

def createParquetTable(name: String, file: String, sqlContext: SQLContext):
Unit = {
  import sqlContext._

  val rdd = parquetFile(file)
  val schema = rdd.schema.fields.map(f = s${f.name}
${HiveMetastoreTypes.toMetastoreType(f.dataType)}).mkString(,\n)
  val ddl = s
|CREATE EXTERNAL TABLE $name (
|  $schema
|)
|ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
|STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
|OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
|LOCATION '$file'.stripMargin
  sql(ddl)
  setConf(spark.sql.hive.convertMetastoreParquet, true)
}

On Tue, Sep 23, 2014 at 10:49 AM, Michael Armbrust mich...@databricks.com
wrote:

 You can't directly query JSON tables from the CLI or JDBC server since
 temporary tables only live for the life of the Spark Context.  This PR will
 eventually (targeted for 1.2) let you do what you want in pure SQL:
 https://github.com/apache/spark/pull/2475

 On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai huaiyin@gmail.com wrote:

 Hi Gaurav,

 Seems metastore should be created by LocalHiveContext and metastore_db
 should be created by a regular HiveContext. Can you check if you are still
 using LocalHiveContext when you tried to access your tables? Also, if you
 created those tables when you launched your sql cli under bin/, you can
 launch sql cli in the same dir (bin/) and spark sql should be able to
 connect to the metastore without any setting.

 btw, Can you let me know your settings in hive-site?

 Thanks,

 Yin

 On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari gtins...@gmail.com
 wrote:

 Hi ,

 I tried setting the metastore and metastore_db location in the
 *conf/hive-site.xml *to the directories created in spark bin folder
 (they were created when I ran spark shell and used LocalHiveContext), but
 still doesn't work

 Do I need to same my RDD as a table through hive context to make this
 work?

 Regards,
 Gaurav

 On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai huaiyin@gmail.com wrote:

 Hi Gaurav,

 Can you put hive-site.xml in conf/ and try again?

 Thanks,

 Yin

 On Mon, Sep 22, 2014 at 4:02 PM, gtinside gtins...@gmail.com wrote:

 Hi ,

 I have been using spark shell to execute all SQLs. I am connecting to
 Cassandra , converting the data in JSON and then running queries on
 it,  I
 am using HiveContext (and not SQLContext) because of explode 
 functionality in it.

 I want to see how can I use Spark SQL CLI for directly running the
 queries
 on saved table. I see metastore and metastore_db getting created in the
 spark bin directory (my hive context is LocalHiveContext). I tried
 executing
 queries in spark-sql cli after putting in a hive-site.xml with
 metastore and
 metastore db directory same as the one in spark bin,  but it doesn't
 seem to
 be working. I am getting
 org.apache.hadoop.hive.ql.metadata.HiveException:
 Unable to fetch table test_tbl.

 Is this possible ?

 Regards,
 Gaurav



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
 Sent from the Apache Spark User List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org








Re: Spark SQL CLI

2014-09-22 Thread Yin Huai
Hi Gaurav,

Can you put hive-site.xml in conf/ and try again?

Thanks,

Yin

On Mon, Sep 22, 2014 at 4:02 PM, gtinside gtins...@gmail.com wrote:

 Hi ,

 I have been using spark shell to execute all SQLs. I am connecting to
 Cassandra , converting the data in JSON and then running queries on it,  I
 am using HiveContext (and not SQLContext) because of explode 
 functionality in it.

 I want to see how can I use Spark SQL CLI for directly running the queries
 on saved table. I see metastore and metastore_db getting created in the
 spark bin directory (my hive context is LocalHiveContext). I tried
 executing
 queries in spark-sql cli after putting in a hive-site.xml with metastore
 and
 metastore db directory same as the one in spark bin,  but it doesn't seem
 to
 be working. I am getting org.apache.hadoop.hive.ql.metadata.HiveException:
 Unable to fetch table test_tbl.

 Is this possible ?

 Regards,
 Gaurav



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org