As per this document: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS), Hive CTAS has the restriction that the target table cannot be a partitioned table. Tejas also pointed out you has no need specify the column information as it is derived from the result of SELECT statement, so the correct way to write a CTAS is like Hive’s CTAS example:
CREATE TABLE new_key_value_store ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" STORED AS RCFile AS SELECT (key % 1024) new_key, concat(key, value) key_value_pair FROM key_value_store SORT BY new_key, key_value_pair; It works well in Spark/beeline also. Thanks, Yucai From: Younes Naguib [mailto:younes.nag...@tritondigital.com] Sent: Wednesday, January 27, 2016 2:33 AM To: 'Tejas Patil' <tejas.patil...@gmail.com>; 'yuzhih...@gmail.com' <yuzhih...@gmail.com> Cc: user@spark.apache.org Subject: RE: ctas fails with "No plan for CreateTableAsSelect" It seems that for partitioned tables, you need to create the table 1st, and run an insert into table to take advantage of the dynamic partition allocation. That worked for me. @Ted I just realized you were asking for a complete stack trace. 2016-01-26 15:36:04 ERROR SparkExecuteStatementOperation:95 - Error executing query, currentState RUNNING, java.lang.AssertionError: assertion failed: No plan for CreateTableAsSelect HiveTable(Some(default),tab1, ArrayBuffer(col1 ,timestamp,null), HiveColumn(col2,string,null), HiveColumn(col3,int,null), HiveColumn(col4,int,null) HiveColumn(overflow,array<string>,null)),ArrayBuffer(HiveColumn(year,int,null), HiveColumn(month,int,null), HiveColumn(day,int,null)),Map(),Map(),ManagedTable,Some(hdfs://localhost:9000/tab1),Some(org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat),Some(org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat),Some(org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe),None), false {Huge explain plan} at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:211) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:151) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:164) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Thanks for all the suggestions. From: Younes Naguib [mailto:younes.nag...@tritondigital.com] Sent: January-26-16 11:42 AM To: 'Tejas Patil' Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: ctas fails with "No plan for CreateTableAsSelect" The destination table is partitioned. If I don’t specify the columns I get : Error: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Partition column name year conflicts with table columns. (state=,code=0) younes From: Tejas Patil [mailto:tejas.patil...@gmail.com] Sent: January-26-16 11:39 AM To: Younes Naguib Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: ctas fails with "No plan for CreateTableAsSelect" In CTAS, you should not specify the column information as it is derived from the result of SELECT statement. See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS) ~tejasp On Tue, Jan 26, 2016 at 9:48 PM, Younes Naguib <younes.nag...@tritondigital.com<mailto:younes.nag...@tritondigital.com>> wrote: The CTAS works when not using partitions or not defining columns. Ex: Create table default.tab1 stored as parquet location 'hdfs://mtl2-alabs-dwh01.streamtheworld.net:9000/younes/geo_location_enrichment<http://mtl2-alabs-dwh01.streamtheworld.net:9000/younes/geo_location_enrichment>' as Select * from tab2 Works. From: Ted Yu [mailto:yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>] Sent: January-26-16 11:11 AM To: Younes Naguib Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: ctas fails with "No plan for CreateTableAsSelect" Maybe try enabling the following (false by default): "spark.sql.hive.convertCTAS" doc = "When true, a table created by a Hive CTAS statement (no USING clause) will be " + "converted to a data source table, using the data source set by spark.sql.sources.default.") FYI On Tue, Jan 26, 2016 at 8:06 AM, Younes Naguib <younes.nag...@tritondigital.com<mailto:younes.nag...@tritondigital.com>> wrote: SQL on beeline and connecting to the thriftserver. Younes From: Ted Yu [mailto:yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>] Sent: January-26-16 11:05 AM To: Younes Naguib Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: ctas fails with "No plan for CreateTableAsSelect" Were you using HiveContext or SQLContext ? Can you show the complete stack trace ? Thanks On Tue, Jan 26, 2016 at 8:00 AM, Younes Naguib <younes.nag...@tritondigital.com<mailto:younes.nag...@tritondigital.com>> wrote: Hi, I’m running CTAS, and it fails with “Error: java.lang.AssertionError: assertion failed: No plan for CreateTableAsSelect HiveTable....” Here what my sql looks like : Create tbl ( Col1 timestamp , Col2 string, Col3 int, ..... ) partitioned by (year int, month int, day int) stored as parquet location 'hdfs://... as select..... ; The select by itself works. I’m running spark 1.6. Younes