Re: Re: spark sql cli query results written to file ?
Well , Sorry for late reponse and thanks a lot for pointing out the clue. fightf...@163.com From: Akhil Das Date: 2015-12-03 14:50 To: Sahil Sareen CC: fightf...@163.com; user Subject: Re: spark sql cli query results written to file ? Oops 3 mins late. :) Thanks Best Regards On Thu, Dec 3, 2015 at 11:49 AM, Sahil Sareen <sareen...@gmail.com> wrote: Yeah, Thats the example from the link I just posted. -Sahil On Thu, Dec 3, 2015 at 11:41 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: Something like this? val df = sqlContext.read.load("examples/src/main/resources/users.parquet") df.select("name", "favorite_color").write.save("namesAndFavColors.parquet") It will save the name, favorite_color columns to a parquet file. You can read more information over here http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes Thanks Best Regards On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.com <fightf...@163.com> wrote: HI, How could I save the spark sql cli running queries results and write the results to some local file ? Is there any available command ? Thanks, Sun. fightf...@163.com
Re: spark sql cli query results written to file ?
Yeah, Thats the example from the link I just posted. -Sahil On Thu, Dec 3, 2015 at 11:41 AM, Akhil Daswrote: > Something like this? > > val df = > sqlContext.read.load("examples/src/main/resources/users.parquet")df.select("name", > "favorite_color").write.save("namesAndFavColors.parquet") > > > It will save the name, favorite_color columns to a parquet file. You can > read more information over here > http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes > > > > Thanks > Best Regards > > On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.com > wrote: > >> HI, >> How could I save the spark sql cli running queries results and write the >> results to some local file ? >> Is there any available command ? >> >> Thanks, >> Sun. >> >> -- >> fightf...@163.com >> > >
Re: spark sql cli query results written to file ?
Something like this? val df = sqlContext.read.load("examples/src/main/resources/users.parquet")df.select("name", "favorite_color").write.save("namesAndFavColors.parquet") It will save the name, favorite_color columns to a parquet file. You can read more information over here http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes Thanks Best Regards On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.comwrote: > HI, > How could I save the spark sql cli running queries results and write the > results to some local file ? > Is there any available command ? > > Thanks, > Sun. > > -- > fightf...@163.com >
Re: spark sql cli query results written to file ?
Did you see: http://spark.apache.org/docs/latest/sql-programming-guide.html -Sahil On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.comwrote: > HI, > How could I save the spark sql cli running queries results and write the > results to some local file ? > Is there any available command ? > > Thanks, > Sun. > > -- > fightf...@163.com >
Re: spark sql cli query results written to file ?
Oops 3 mins late. :) Thanks Best Regards On Thu, Dec 3, 2015 at 11:49 AM, Sahil Sareenwrote: > Yeah, Thats the example from the link I just posted. > > -Sahil > > On Thu, Dec 3, 2015 at 11:41 AM, Akhil Das > wrote: > >> Something like this? >> >> val df = >> sqlContext.read.load("examples/src/main/resources/users.parquet")df.select("name", >> "favorite_color").write.save("namesAndFavColors.parquet") >> >> >> It will save the name, favorite_color columns to a parquet file. You can >> read more information over here >> http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes >> >> >> >> Thanks >> Best Regards >> >> On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.com >> wrote: >> >>> HI, >>> How could I save the spark sql cli running queries results and write the >>> results to some local file ? >>> Is there any available command ? >>> >>> Thanks, >>> Sun. >>> >>> -- >>> fightf...@163.com >>> >> >> >
Re: Spark SQL CLI
Thanks , will give it a try, appreciate your help Regards, Gaurav On Sep 23, 2014 1:52 PM, Michael Armbrust mich...@databricks.com wrote: A workaround for now would be to save the JSON as parquet and the create a metastore parquet table. Using parquet will be much faster for repeated querying. This function might be helpful: import org.apache.spark.sql.hive.HiveMetastoreTypes def createParquetTable(name: String, file: String, sqlContext: SQLContext): Unit = { import sqlContext._ val rdd = parquetFile(file) val schema = rdd.schema.fields.map(f = s${f.name} ${HiveMetastoreTypes.toMetastoreType(f.dataType)}).mkString(,\n) val ddl = s |CREATE EXTERNAL TABLE $name ( | $schema |) |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat' |LOCATION '$file'.stripMargin sql(ddl) setConf(spark.sql.hive.convertMetastoreParquet, true) } On Tue, Sep 23, 2014 at 10:49 AM, Michael Armbrust mich...@databricks.com wrote: You can't directly query JSON tables from the CLI or JDBC server since temporary tables only live for the life of the Spark Context. This PR will eventually (targeted for 1.2) let you do what you want in pure SQL: https://github.com/apache/spark/pull/2475 On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai huaiyin@gmail.com wrote: Hi Gaurav, Seems metastore should be created by LocalHiveContext and metastore_db should be created by a regular HiveContext. Can you check if you are still using LocalHiveContext when you tried to access your tables? Also, if you created those tables when you launched your sql cli under bin/, you can launch sql cli in the same dir (bin/) and spark sql should be able to connect to the metastore without any setting. btw, Can you let me know your settings in hive-site? Thanks, Yin On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari gtins...@gmail.com wrote: Hi , I tried setting the metastore and metastore_db location in the *conf/hive-site.xml *to the directories created in spark bin folder (they were created when I ran spark shell and used LocalHiveContext), but still doesn't work Do I need to same my RDD as a table through hive context to make this work? Regards, Gaurav On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai huaiyin@gmail.com wrote: Hi Gaurav, Can you put hive-site.xml in conf/ and try again? Thanks, Yin On Mon, Sep 22, 2014 at 4:02 PM, gtinside gtins...@gmail.com wrote: Hi , I have been using spark shell to execute all SQLs. I am connecting to Cassandra , converting the data in JSON and then running queries on it, I am using HiveContext (and not SQLContext) because of explode functionality in it. I want to see how can I use Spark SQL CLI for directly running the queries on saved table. I see metastore and metastore_db getting created in the spark bin directory (my hive context is LocalHiveContext). I tried executing queries in spark-sql cli after putting in a hive-site.xml with metastore and metastore db directory same as the one in spark bin, but it doesn't seem to be working. I am getting org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table test_tbl. Is this possible ? Regards, Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL CLI
You can't directly query JSON tables from the CLI or JDBC server since temporary tables only live for the life of the Spark Context. This PR will eventually (targeted for 1.2) let you do what you want in pure SQL: https://github.com/apache/spark/pull/2475 On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai huaiyin@gmail.com wrote: Hi Gaurav, Seems metastore should be created by LocalHiveContext and metastore_db should be created by a regular HiveContext. Can you check if you are still using LocalHiveContext when you tried to access your tables? Also, if you created those tables when you launched your sql cli under bin/, you can launch sql cli in the same dir (bin/) and spark sql should be able to connect to the metastore without any setting. btw, Can you let me know your settings in hive-site? Thanks, Yin On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari gtins...@gmail.com wrote: Hi , I tried setting the metastore and metastore_db location in the *conf/hive-site.xml *to the directories created in spark bin folder (they were created when I ran spark shell and used LocalHiveContext), but still doesn't work Do I need to same my RDD as a table through hive context to make this work? Regards, Gaurav On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai huaiyin@gmail.com wrote: Hi Gaurav, Can you put hive-site.xml in conf/ and try again? Thanks, Yin On Mon, Sep 22, 2014 at 4:02 PM, gtinside gtins...@gmail.com wrote: Hi , I have been using spark shell to execute all SQLs. I am connecting to Cassandra , converting the data in JSON and then running queries on it, I am using HiveContext (and not SQLContext) because of explode functionality in it. I want to see how can I use Spark SQL CLI for directly running the queries on saved table. I see metastore and metastore_db getting created in the spark bin directory (my hive context is LocalHiveContext). I tried executing queries in spark-sql cli after putting in a hive-site.xml with metastore and metastore db directory same as the one in spark bin, but it doesn't seem to be working. I am getting org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table test_tbl. Is this possible ? Regards, Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL CLI
A workaround for now would be to save the JSON as parquet and the create a metastore parquet table. Using parquet will be much faster for repeated querying. This function might be helpful: import org.apache.spark.sql.hive.HiveMetastoreTypes def createParquetTable(name: String, file: String, sqlContext: SQLContext): Unit = { import sqlContext._ val rdd = parquetFile(file) val schema = rdd.schema.fields.map(f = s${f.name} ${HiveMetastoreTypes.toMetastoreType(f.dataType)}).mkString(,\n) val ddl = s |CREATE EXTERNAL TABLE $name ( | $schema |) |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat' |LOCATION '$file'.stripMargin sql(ddl) setConf(spark.sql.hive.convertMetastoreParquet, true) } On Tue, Sep 23, 2014 at 10:49 AM, Michael Armbrust mich...@databricks.com wrote: You can't directly query JSON tables from the CLI or JDBC server since temporary tables only live for the life of the Spark Context. This PR will eventually (targeted for 1.2) let you do what you want in pure SQL: https://github.com/apache/spark/pull/2475 On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai huaiyin@gmail.com wrote: Hi Gaurav, Seems metastore should be created by LocalHiveContext and metastore_db should be created by a regular HiveContext. Can you check if you are still using LocalHiveContext when you tried to access your tables? Also, if you created those tables when you launched your sql cli under bin/, you can launch sql cli in the same dir (bin/) and spark sql should be able to connect to the metastore without any setting. btw, Can you let me know your settings in hive-site? Thanks, Yin On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari gtins...@gmail.com wrote: Hi , I tried setting the metastore and metastore_db location in the *conf/hive-site.xml *to the directories created in spark bin folder (they were created when I ran spark shell and used LocalHiveContext), but still doesn't work Do I need to same my RDD as a table through hive context to make this work? Regards, Gaurav On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai huaiyin@gmail.com wrote: Hi Gaurav, Can you put hive-site.xml in conf/ and try again? Thanks, Yin On Mon, Sep 22, 2014 at 4:02 PM, gtinside gtins...@gmail.com wrote: Hi , I have been using spark shell to execute all SQLs. I am connecting to Cassandra , converting the data in JSON and then running queries on it, I am using HiveContext (and not SQLContext) because of explode functionality in it. I want to see how can I use Spark SQL CLI for directly running the queries on saved table. I see metastore and metastore_db getting created in the spark bin directory (my hive context is LocalHiveContext). I tried executing queries in spark-sql cli after putting in a hive-site.xml with metastore and metastore db directory same as the one in spark bin, but it doesn't seem to be working. I am getting org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table test_tbl. Is this possible ? Regards, Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL CLI
Hi Gaurav, Can you put hive-site.xml in conf/ and try again? Thanks, Yin On Mon, Sep 22, 2014 at 4:02 PM, gtinside gtins...@gmail.com wrote: Hi , I have been using spark shell to execute all SQLs. I am connecting to Cassandra , converting the data in JSON and then running queries on it, I am using HiveContext (and not SQLContext) because of explode functionality in it. I want to see how can I use Spark SQL CLI for directly running the queries on saved table. I see metastore and metastore_db getting created in the spark bin directory (my hive context is LocalHiveContext). I tried executing queries in spark-sql cli after putting in a hive-site.xml with metastore and metastore db directory same as the one in spark bin, but it doesn't seem to be working. I am getting org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table test_tbl. Is this possible ? Regards, Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org