Refreshing table only works for the Spark SQL DataSource  in my understanding, 
apparently here, you’re running a Hive Table.

Can you try to create a table like:

        |CREATE TEMPORARY TABLE parquetTable (a int, b string)
        |USING org.apache.spark.sql.parquet.DefaultSource
        |OPTIONS (
        |  path '/root_path'
        |)

And then df2.write.parquet("hdfs://root_path/test_table/key=2") …

Cheng

From: Jerrick Hoang [mailto:jerrickho...@gmail.com]
Sent: Tuesday, August 11, 2015 2:15 PM
To: user
Subject: Refresh table

Hi all,

I'm a little confused about how refresh table (SPARK-5833) should work. So I 
did the following,

val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double")

df1.write.parquet("hdfs://<path>/test_table/key=1")

Then I created an external table by doing,

CREATE EXTERNAL TABLE `tmp_table` (
`single`: int,
`double`: int)
PARTITIONED BY (
  `key` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://<path>/test_table/'

Then I added the partition to the table by `alter table tmp_table add partition 
(key=1) location 'hdfs://..`

Then I added a new partition with different schema by,


val df2 = sc.makeRDD(1 to 5).map(i => (i, i * 3)).toDF("single", "triple")

df2.write.parquet("hdfs://<path>/test_table/key=2")

And added the new partition to the table by `alter table ..`,

But when I did `refresh table tmp_table` and `describe table` it couldn't pick 
up the new column `triple`. Can someone explain to me how partition discovery 
and schema merging of refresh table should work?

Thanks

Reply via email to