> Sounds like VARCHAR and CHAR types were created for Hive to have ANSI SQL > Compliance. Otherwise they seem to be practically the same as String types.
They are relatively identical in storage, except both are slower on the CPU in actual use (CHAR has additional padding code in the hot-path). There is no constant form for those two types, so all string operations like say = 'NONE' would get promoted up as UDFToString(varcharcol) = 'NONE' Resulting in all ORC/Parquet index pushdowns being turned off due to the cast on the column & if you run an explain and notice something similar, it will cause a significant performance loss. In general, I see 2-3x performance degradation in case of CHAR/VARCHAR when doing constant filter operations & other issues when joining different sized ops (Varchar(3) x Varchar(4) would go this route). The default String types are faster purely because they are the destination type for any up-conversion or constant-folding conversions. Cheers, Gopal