Coming from DBMS background I tend to treat the columns in Hive similar to
an RDBMS table. For example if a table created in Hive as Parquet I will
use VARCHAR(30) for column that has been defined as VARCHAR(30) as source.
If a column is defined as TEXT in RDBMS table I use STRING in Hive with a
max size of 2GB I believe.

My view is that it is more efficient storage wise to have Hive table
created as VARCHA as opposed to STRING.

I have not really seen any performance difference if one uses VARCHAR or
STRING. However, I believe there is a reason why one has VARCH in Hive as
opposed to STRRING.

What is the thread view on this?

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to