RE: ctas fails with "No plan for CreateTableAsSelect"

Yu, Yucai Wed, 27 Jan 2016 21:27:49 -0800

As per this document: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS),
 Hive CTAS has the restriction that the target table cannot be a partitioned 
table.
Tejas also pointed out you has no need specify the column information as it is 
derived from the result of SELECT statement, so the correct way to write a CTAS 
is like Hive’s CTAS example:


CREATE TABLE new_key_value_store
   ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
   STORED AS RCFile
   AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;

It works well in Spark/beeline also.

Thanks,
Yucai

From: Younes Naguib [mailto:younes.nag...@tritondigital.com]
Sent: Wednesday, January 27, 2016 2:33 AM
To: 'Tejas Patil' <tejas.patil...@gmail.com>; 'yuzhih...@gmail.com' 
<yuzhih...@gmail.com>
Cc: user@spark.apache.org
Subject: RE: ctas fails with "No plan for CreateTableAsSelect"

It seems that for partitioned tables, you need to create the table 1st, and run 
an insert into table to take advantage of the dynamic partition allocation.
That worked for me.

@Ted I just realized you were asking for a complete stack trace.
2016-01-26 15:36:04 ERROR SparkExecuteStatementOperation:95 - Error executing 
query, currentState RUNNING,
java.lang.AssertionError: assertion failed: No plan for CreateTableAsSelect 
HiveTable(Some(default),tab1, ArrayBuffer(col1 ,timestamp,null), 
HiveColumn(col2,string,null), HiveColumn(col3,int,null), 
HiveColumn(col4,int,null) 
HiveColumn(overflow,array<string>,null)),ArrayBuffer(HiveColumn(year,int,null), 
HiveColumn(month,int,null), 
HiveColumn(day,int,null)),Map(),Map(),ManagedTable,Some(hdfs://localhost:9000/tab1),Some(org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat),Some(org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat),Some(org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe),None),
 false
                {Huge explain plan}

        at scala.Predef$.assert(Predef.scala:179)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
        at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
        at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
        at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
        at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
        at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
        at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:211)
        at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154)
        at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:151)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:164)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


Thanks for all the suggestions.

From: Younes Naguib [mailto:younes.nag...@tritondigital.com]
Sent: January-26-16 11:42 AM
To: 'Tejas Patil'
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: ctas fails with "No plan for CreateTableAsSelect"

The destination table is partitioned. If I don’t specify the columns I get :
                Error: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Partition column name year 
conflicts with table columns. (state=,code=0)

younes

From: Tejas Patil [mailto:tejas.patil...@gmail.com]
Sent: January-26-16 11:39 AM
To: Younes Naguib
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: ctas fails with "No plan for CreateTableAsSelect"

In CTAS, you should not specify the column information as it is derived from 
the result of SELECT statement. See 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS)

~tejasp

On Tue, Jan 26, 2016 at 9:48 PM, Younes Naguib 
<younes.nag...@tritondigital.com<mailto:younes.nag...@tritondigital.com>> wrote:
The CTAS works when not using partitions or not defining columns.

Ex: Create table default.tab1 stored as parquet location 
'hdfs://mtl2-alabs-dwh01.streamtheworld.net:9000/younes/geo_location_enrichment<http://mtl2-alabs-dwh01.streamtheworld.net:9000/younes/geo_location_enrichment>'
as
Select *
from tab2

Works.

From: Ted Yu [mailto:yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>]
Sent: January-26-16 11:11 AM

To: Younes Naguib
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: ctas fails with "No plan for CreateTableAsSelect"

Maybe try enabling the following (false by default):
"spark.sql.hive.convertCTAS"

doc = "When true, a table created by a Hive CTAS statement (no USING clause) 
will be " + "converted to a data source table, using the data source set by 
spark.sql.sources.default.")
FYI

On Tue, Jan 26, 2016 at 8:06 AM, Younes Naguib 
<younes.nag...@tritondigital.com<mailto:younes.nag...@tritondigital.com>> wrote:
SQL on beeline and connecting to the thriftserver.

Younes

From: Ted Yu [mailto:yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>]
Sent: January-26-16 11:05 AM
To: Younes Naguib
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: ctas fails with "No plan for CreateTableAsSelect"

Were you using HiveContext or SQLContext ?

Can you show the complete stack trace ?

Thanks

On Tue, Jan 26, 2016 at 8:00 AM, Younes Naguib 
<younes.nag...@tritondigital.com<mailto:younes.nag...@tritondigital.com>> wrote:
Hi,

I’m running CTAS, and it fails with “Error: java.lang.AssertionError: assertion 
failed: No plan for CreateTableAsSelect HiveTable....”
Here what my sql looks like :
Create tbl (
Col1          timestamp ,
Col2         string,
Col3          int,
.....
)
partitioned by (year int, month int, day int)
stored as parquet
location 'hdfs://...
as select.....
;

The select by itself works.

I’m running spark 1.6.


Younes

RE: ctas fails with "No plan for CreateTableAsSelect"

Reply via email to