Refreshing table only works for the Spark SQL DataSource in my understanding, apparently here, you’re running a Hive Table.
Can you try to create a table like: |CREATE TEMPORARY TABLE parquetTable (a int, b string) |USING org.apache.spark.sql.parquet.DefaultSource |OPTIONS ( | path '/root_path' |) And then df2.write.parquet("hdfs://root_path/test_table/key=2") … Cheng From: Jerrick Hoang [mailto:jerrickho...@gmail.com] Sent: Tuesday, August 11, 2015 2:15 PM To: user Subject: Refresh table Hi all, I'm a little confused about how refresh table (SPARK-5833) should work. So I did the following, val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double") df1.write.parquet("hdfs://<path>/test_table/key=1") Then I created an external table by doing, CREATE EXTERNAL TABLE `tmp_table` ( `single`: int, `double`: int) PARTITIONED BY ( `key` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://<path>/test_table/' Then I added the partition to the table by `alter table tmp_table add partition (key=1) location 'hdfs://..` Then I added a new partition with different schema by, val df2 = sc.makeRDD(1 to 5).map(i => (i, i * 3)).toDF("single", "triple") df2.write.parquet("hdfs://<path>/test_table/key=2") And added the new partition to the table by `alter table ..`, But when I did `refresh table tmp_table` and `describe table` it couldn't pick up the new column `triple`. Can someone explain to me how partition discovery and schema merging of refresh table should work? Thanks