In would like to create a parquet with a partition on computed data (without to 
have to put the result of the computation in the parquet) :
The goal is to optimize the parquet for typical expecting queries.

Imaginary example :
CREATE TABLE `mytable` 
PARTITION BY (substr(name,1,1)) AS  
SELECT name, birthdate, birthcity
ORDER BY bithdate;

So, if I do that I obtain a VALIDATION ERROR: Partition column ... is not in 
the SELECT list of CTAS

And the comment of the code of the function 
"public static RelNode qualifyPartitionCol(RelNode input, List<String> 
partitionColumns)" 
confirms that it's not possible actually :
" A partition column is resolved, either (1) the same column appear in the 
select list of CTAS or (2) CTAS has a * in select list"

But what is the reason of this limitation ?
Is there exists any tricks to do it right now, or can we expect an evolution to 
allow this possibilities.

I just imagine to do (with the data of the example)
CREATE TABLE `mytable` 
PARTITION BY (sname) AS  
SELECT substr(name,1,1) sname, name, birthdate, birthcity
ORDER BY bithdate;
Then, next, request each partition file to remove the useless data 
, like
CREATE TABLE `mytable_2/partition_x` 
SELECT name, birthdate, birthcity
ORDER BY bithdate;
but it's not really satisfying...

I would appreciate yours comments,
Regards,

benj

Reply via email to