subject:"\[jira\] \[Commented\] \(SPARK\-14927\) DataFrame. saveAsTable creates RDD partitions but not Hive partitions"

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2017-09-11 Thread Rajesh Chandramohan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162522#comment-16162522
 ] 

Rajesh Chandramohan commented on SPARK-14927:
-

[~ctang.ma] , I was talking here about HIVE-1.2 version & Spark-2.1.0. In Our 
Environment we are not yet moved to HIVE-2.0

*Once Spark Updates the HIVE table , later we can't access hive table via hive*
spark-sql> insert into table zeta_dev_gdw_tables.rajesh_user1 values ( 1002);

 Once above command executed the HDFS file structure changes. Then we can't 
access via hive , it throws error.

--
hive>  select * from zeta_dev_gdw_tables.rajesh_user1  ;
OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
Failed with exception java.io.IOException:parquet.io.ParquetDecodingException: 
Can not read value at 0 in block 0 in file 
hdfs://cluster/user/hive/warehouse/zeta_dev_gdw_tables.db/rajesh_user1/part-0-d4d0b321-3e7e-4525-9de0-d64fb223e9b2.snappy.parquet
Time taken: 1.495 seconds
hive>
  
  
Hive log
Caused by: parquet.io.ParquetDecodingException: Can not read value at 0 in 
block 0 in file 
hdfs://cluster/user/hive/warehouse/zeta_dev_gdw_tables.db/rajesh_user1/part-0-d4d0b321-3e7e-4525-9de0-d64fb223e9b2.snappy.parquet
at 
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
at 
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:122)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:85)
at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:682)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:328)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:450)
... 15 more
Caused by: java.lang.UnsupportedOperationException: 
org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$1Should be 
related to this Bug: HIVE-15082

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2017-08-14 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126026#comment-16126026
 ] 

Chaoyu Tang commented on SPARK-14927:
-

I was not able to reproduce the issue as reported using Spark SQL 2.1.1 and 
Hive 2.1.1 from EMR. The table created in Spark SQL (partitioned or 
non-partitioned) could be accessed in Hive without any problems. I also 
examined table/partition/column metadata in HMS backend tables like TBLS, SDS, 
PARTITION_KEY_VALS, COLUMS_V2, and so far have not seen any discrepancies 
between the tables created from Spark and Hive.
[~rajeshc] I wonder if you are still having the issue, if yes, could you share 
your sample so that I may help to look into it? Thanks

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2017-08-10 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121931#comment-16121931
 ] 

Chaoyu Tang commented on SPARK-14927:
-

[~rajeshc] could you provide your example here?

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2017-07-29 Thread Raul Saez Tapia (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106058#comment-16106058
 ] 

Raul Saez Tapia commented on SPARK-14927:
-

*SOMASUNDARAM SUDALAIMUTHU* I am sorry but I am out of project so I can't check 
it.

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2017-07-28 Thread SOMASUNDARAM SUDALAIMUTHU (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105004#comment-16105004
 ] 

SOMASUNDARAM SUDALAIMUTHU commented on SPARK-14927:
---

Is this fixed in 2.0 version ?

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2017-07-06 Thread Rajesh Chandramohan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075996#comment-16075996
 ] 

Rajesh Chandramohan commented on SPARK-14927:
-

The table created via spark-sql( spark-2.1.0) having partitions and stored as 
parquet ,not able to access through hive-cli( hive-1.2) .  can see same kind of 
warning 
 
"Persisting bucketed data source relation `db1`.`rajesh_user` into Hive 
metastore in Spark SQL specific format, which is NOT compatible with Hive:"

The workaround which we tried is creating the table via hive and then inserting 
data via spark-sql doesn't have issues of incompatibility . What would be the 
right fix .?



> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2016-10-27 Thread Raul Saez Tapia (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610900#comment-15610900
 ] 

Raul Saez Tapia commented on SPARK-14927:
-

[~xwu0226] for me is working fine your example with Spark 1.6.1. However it is 
not working when we use UDT.

My DataFrame shows:
```
scala> model_date.toDF.show
+++
|date|   model|
+++
|20160610|[aa.bb.spark.types.PersonWrapper@8542...|
|20160610|[aa.bb.spark.types.PersonWrapper@8831..|
...
...
+++
```

I have created the table with some specific properties so I can say how is 
defined the table and how to parse from PersonType UDT to table schema:
```
create table model_orc (`model` 
struct>) PARTITIONED BY (`date` 
string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' WITH 
SERDEPROPERTIES ('path'='hdfs:///user/raulsaez/model_orc') STORED AS 
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 
'hdfs:///user/raulsaez/model_orc' 
TBLPROPERTIES('spark.sql.sources.schema.numParts'='1','spark.sql.sources.schema.part.0'='{
 \"type\":\"struct\",\"fields\":[{ \"name\":\"personWrapper\",\"type\":{ 
\"type\":\"udt\",\"class\":\"aa.bb.spark.types.PersonType\",\"pyClass\":null,\"sqlType\":{
 \"type\":\"struct\",\"fields\":[{ \"name\":\"id\",\"type\":   
\"integer\",\"nullable\":true,\"metadata\":{} } ,{ 
\"name\":\"name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{} }] } 
},\"nullable\":true,\"metadata\":{} }] }')
```

Now we insert data into table:
```
scala> hiveContext.sql("insert into model_orc partition(date=20160610) select 
model,date from dfJune")
org.apache.spark.sql.AnalysisException: cannot resolve 'cast(model as 
struct>>)'
due to data type mismatch: cannot cast
StructType(StructField(personWrapper,,true)),true)
to
StructType(StructField(person,StructType(StructField(id,IntegerType,true),StructField(name,StringType,true),),true)
```
I have the same issue with both Parquet and ORC.



And if I persist the DataFrame as a table with ORC:
```
model_date.toDF.write.format("orc").partitionBy("date").saveAsTable("model_orc")
```
Or even if I persist it as a ORC file:
```
scala> 
model_date.toDF.write.mode(SaveMode.Append).format("orc").partitionBy("date").save("model_orc_asTable")
```

I get the ClassCastException:
```
Caused by: java.lang.ClassCastException: aa.bb.spark.types.PersonType cannot be 
cast to org.apache.spark.sql.types.StructType
at 
org.apache.spark.sql.hive.HiveInspectors$class.wrap(HiveInspectors.scala:557)
at 
org.apache.spark.sql.hive.orc.OrcOutputWriter.wrap(OrcRelation.scala:66)
at 
org.apache.spark.sql.hive.HiveInspectors$class.wrap(HiveInspectors.scala:568)
at 
org.apache.spark.sql.hive.orc.OrcOutputWriter.wrap(OrcRelation.scala:66)
at 
org.apache.spark.sql.hive.HiveInspectors$$anonfun$wrap$1.apply(HiveInspectors.scala:590)
at 
org.apache.spark.sql.hive.HiveInspectors$$anonfun$wrap$1.apply(HiveInspectors.scala:589)
at 
org.apache.spark.sql.catalyst.util.ArrayData.foreach(ArrayData.scala:135)
at 
org.apache.spark.sql.hive.HiveInspectors$class.wrap(HiveInspectors.scala:589)
at 
org.apache.spark.sql.hive.orc.OrcOutputWriter.wrap(OrcRelation.scala:66)
at 
org.apache.spark.sql.hive.HiveInspectors$class.wrap(HiveInspectors.scala:568)
at 
org.apache.spark.sql.hive.orc.OrcOutputWriter.wrap(OrcRelation.scala:66)
at 
org.apache.spark.sql.hive.orc.OrcOutputWriter.wrapOrcStruct(OrcRelation.scala:128)
at 
org.apache.spark.sql.hive.orc.OrcOutputWriter.writeInternal(OrcRelation.scala:139)
at 
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:358)
... 8 more
```


If I persist the DataFrame as a table with Parquet:
```
scala> 
model_date.toDF.write.mode(SaveMode.Append).format("parquet").partitionBy("date").saveAsTable("model_parquet_asTable")
16/10/27 09:39:24 WARN HiveContext$$anon$2: Persisting partitioned data source 
relation `model_parquet_asTable` into Hive metastore in Spark SQL specific 
format, which is NOT compatible with Hive. Input path(s):
hdfs://dev-nameservice/apps/hive/warehouse/model_parquet_astable
...
...
...
scala> hiveContext.sql("select * from model_parquet_asTable where 
date=20160610").show
+++
|   model|date|
+++
|[aa.bb.spark.types.PersonWrapper@8542...|20160610|
|[aa.bb.spark.types.PersonWrapper@8831...|20160610|
|[aa.bb.spark.types.PersonWrapper@3661...|20160610|
...
...
...
+++
only showing top 20 rows

```

>From Hive I can see the table though it looks

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2016-08-25 Thread Xin Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438457#comment-15438457
 ] 

Xin Wu commented on SPARK-14927:


[~smilegator] Do you think what you are working on regarding will fix this 
issue? This is to allow hive to see the partitions created by SparkSQL from a 
data frame. 

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2016-08-25 Thread Corentin Kerisit (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437861#comment-15437861
 ] 

Corentin Kerisit commented on SPARK-14927:
--

Any guideline on how we could help get this resolved ?

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2016-07-12 Thread Aman Goyal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372747#comment-15372747
 ] 

Aman Goyal commented on SPARK-14927:


is there any planned fix for this jira? or any other workaround for the same. 
"Create table" workaround gets tricky for a generic API which wants to write a 
dataframe (partitioned) in hive storage. Or If anyone can point to any generic 
Hive query builder for Spark Sql Struct Type ?

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2016-05-02 Thread Xin Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267576#comment-15267576
 ] 

Xin Wu commented on SPARK-14927:


right now, when a datasource table is created with partition, it is not a hive 
compatiable table. 

So maybe need to create the table like {code}create table tmp.tmp1 (val string) 
partitioned by (year int) stored as parquet location '' {code}
Then insert into the table with a temp table that is derived from the 
dataframe. Something I tried below 
{code}
scala> df.show
++---+
|year|val|
++---+
|2012|  a|
|2013|  b|
|2014|  c|
++---+

scala> val df1 = spark.sql("select * from t000 where year = 2012")
df1: org.apache.spark.sql.DataFrame = [year: int, val: string]

scala> df1.registerTempTable("df1")

scala> spark.sql("insert into tmp.ptest3 partition(year=2012) select * from 
df1")

scala> val df2 = spark.sql("select * from t000 where year = 2013")
df2: org.apache.spark.sql.DataFrame = [year: int, val: string]

scala> df2.registerTempTable("df2")

scala> spark.sql("insert into tmp.ptest3 partition(year=2013) select val from 
df2")
16/05/02 14:47:34 WARN log: Updating partition stats fast for: ptest3
16/05/02 14:47:34 WARN log: Updated size to 327
res54: org.apache.spark.sql.DataFrame = []

scala> spark.sql("show partitions tmp.ptest3").show
+-+
|   result|
+-+
|year=2012|
|year=2013|
+-+

{code}

This is a bit hacky though. hope someone has a better solution for your 
problem. And this is on spark 2.0.  Try if 1.6 can take this. 

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2016-05-02 Thread Sasha Ovsankin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266836#comment-15266836
 ] 

Sasha Ovsankin commented on SPARK-14927:


Thanks for your reply [~xwu0226], so how can I access partition information in 
2.0? On a related note, I would like the resulting file to be accessible by 
Hive, what would be the recommended way of doing this?

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2016-04-30 Thread Xin Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265591#comment-15265591
 ] 

Xin Wu commented on SPARK-14927:


Since Spark 2.0.0 has moved around a lot of stuff, including splitting the 
HiveMetaStoreCatalog into 2 files for resolving and creating tables, 
respectively, I would try this on Spark 2.0.0. 

{code}scala> spark.sql("create database if not exists tmp")
16/04/30 19:59:12 WARN ObjectStore: Failed to get database tmp, returning 
NoSuchObjectException
res23: org.apache.spark.sql.DataFrame = []

scala> 
df.write.partitionBy("year").mode(SaveMode.Append).saveAsTable("tmp.tmp1")
16/04/30 19:59:50 WARN CreateDataSourceTableUtils: Persisting partitioned data 
source relation `tmp`.`tmp1` into Hive metastore in Spark SQL specific format, 
which is NOT compatible with Hive. Input path(s): 
file:/home/xwu0226/spark/spark-warehouse/tmp.db/tmp1

scala> spark.sql("select * from tmp.tmp1").show
+---++
|val|year|
+---++
|  a|2012|
+---++
{code}

For datasource table creation as above, SparkSQL will create the table as a 
hive internal table but not compatible with hive. SparkSQL puts partition 
column information (actually including also other things like column schema, 
bucket/sort columns) into serdeInfo.parameters. When querying the table, 
SparkSQL resolve the table and parse the information back from 
serdeInfo.parameters. 

Spark 2.0.0 does not pass this command to Hive anymore (actually most of DDL 
commands are run natively in SparkSQL now), so when doing "SHOW PARTITIONS...", 
the command now does not support showing partitions for datasource table. 

{code}
scala> spark.sql("show partitions tmp.tmp1").show
org.apache.spark.sql.AnalysisException: SHOW PARTITIONS is not allowed on a 
datasource table: tmp.tmp1;
  at 
org.apache.spark.sql.execution.command.ShowPartitionsCommand.run(commands.scala:196)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:62)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:60)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:113)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:113)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:132)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:129)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:112)
  at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
  at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85)
  at org.apache.spark.sql.Dataset.(Dataset.scala:186)
  at org.apache.spark.sql.Dataset.(Dataset.scala:167)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:62)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:529)
  ... 48 elided
{code}

Hope this helps. 

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
>

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2016-04-26 Thread Sasha Ovsankin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259014#comment-15259014
 ] 

Sasha Ovsankin commented on SPARK-14927:


Seems that the workaround is not to let saveAsTable create the table but rather 
create it oneself prior to writing to it, like so:

{code}
hc.sql("create external table tmp.partitiontest1(val string) partitioned by 
(year int)")

Seq(2012 -> "a", 2013 -> "b", 2014 -> "c").toDF("year", "val")
  .write
  .partitionBy("year")
  .mode(SaveMode.Append)
  .saveAsTable("tmp.partitiontest1")
{code}

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> -
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
>Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> //hc.setConf("hive.exec.dynamic.partition", "true")
> //hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
>   .write
>   .partitionBy("year")
>   .mode(SaveMode.Append)
>   .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ==
> HIVE FAILURE OUTPUT
> ==
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
> ==
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

14 matches

Site Navigation

Mail list logo

Footer information