In would like to create a parquet with a partition on computed data (without to have to put the result of the computation in the parquet) : The goal is to optimize the parquet for typical expecting queries.
Imaginary example : CREATE TABLE `mytable` PARTITION BY (substr(name,1,1)) AS SELECT name, birthdate, birthcity ORDER BY bithdate; So, if I do that I obtain a VALIDATION ERROR: Partition column ... is not in the SELECT list of CTAS And the comment of the code of the function "public static RelNode qualifyPartitionCol(RelNode input, List<String> partitionColumns)" confirms that it's not possible actually : " A partition column is resolved, either (1) the same column appear in the select list of CTAS or (2) CTAS has a * in select list" But what is the reason of this limitation ? Is there exists any tricks to do it right now, or can we expect an evolution to allow this possibilities. I just imagine to do (with the data of the example) CREATE TABLE `mytable` PARTITION BY (sname) AS SELECT substr(name,1,1) sname, name, birthdate, birthcity ORDER BY bithdate; Then, next, request each partition file to remove the useless data , like CREATE TABLE `mytable_2/partition_x` SELECT name, birthdate, birthcity ORDER BY bithdate; but it's not really satisfying... I would appreciate yours comments, Regards, benj
