Hi all, Trying to understand parquet partitioning works.
What is the recommended partitioning scheme for event data that will be queried primarily by date. I assume that partitioning by year and month would be optimal? Lets say I have data that looks like: application,status,date,message kafka,down,2017-03023 04:53,zookeeper is not available Would I have to create new columns for year and month? e.g. application,status,date,message,year,month kafka,down,2017-03023 04:53,zookeeper is not available,2017,03 and then perform a CTAS using the year and month columns as the 'partition by'? Thanks
