Re: VARCHAR or STRING fields in Hive

Mich Talebzadeh Mon, 16 Jan 2017 13:44:28 -0800

Sounds like VARCHAR and CHAR types were created for Hive to have ANSI SQL
Compliance. Otherwise they seem to be practically the same as String types.


HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 16 January 2017 at 17:15, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks Elliot for the insight.
>
> Another issue that Spark does not support "CHAR" types. It supports
> VARCHAR. Often one uses Spark as well on these tables.
>
> This should not really matter. I tend to define CHA(N) to be VARCHAR(N) as
> the assumption is that the table ingested into Parquet say is already
> filled with data so it does not really matter whether one defines CHAR or
> VARCHAR here.
>
> For compatibility reason I would rather use VARCHA as opposed to String to
> provide RDBMS developer a more familiar look and fill.
>
> Also if one uses SNAPPY compression on columns will a VARCHAR as opposed
> to String make any difference in terms of storage efficiency?
>
>
> Regards
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 16 January 2017 at 17:00, Elliot West <tea...@gmail.com> wrote:
>
>> Internally it looks as though Hive simply represents CHAR/VARCHAR values
>> using a Java String and so I would not expect a significant change in
>> execution performance. The Hive JIRA suggests that these types were added
>> to 'support for more SQL-compliant behavior, such as SQL string
>> comparison semantics, max length, etc.' rather than for performance reasons.
>>
>>    - https://issues.apache.org/jira/browse/HIVE-4844
>>    - https://issues.apache.org/jira/browse/HIVE-5191
>>
>> In terms of storage I expect it depends on the underlying file format and
>> the types that these values are encoded to. Parquet does appear to
>> support the specific encoding of both CHAR/VARCHAR, however I'm skeptical
>> that there would be significant storage efficiencies gained by using the
>> CHAR types, over String for comparable values. I'd be keen to hear
>> otherwise.
>>
>>    - https://issues.apache.org/jira/browse/HIVE-7735
>>
>> Thanks,
>>
>> Elliot.
>>
>> On 16 January 2017 at 15:37, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>>
>>> Coming from DBMS background I tend to treat the columns in Hive similar
>>> to an RDBMS table. For example if a table created in Hive as Parquet I will
>>> use VARCHAR(30) for column that has been defined as VARCHAR(30) as source.
>>> If a column is defined as TEXT in RDBMS table I use STRING in Hive with a
>>> max size of 2GB I believe.
>>>
>>> My view is that it is more efficient storage wise to have Hive table
>>> created as VARCHA as opposed to STRING.
>>>
>>> I have not really seen any performance difference if one uses VARCHAR or
>>> STRING. However, I believe there is a reason why one has VARCH in Hive as
>>> opposed to STRRING.
>>>
>>> What is the thread view on this?
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>
>>
>

Re: VARCHAR or STRING fields in Hive

Reply via email to