Hi, all,

Dave just propose a good idea to solve the incompatible issue of STRING and 
VARCHAR mapping differently.

Trafodion used to use CQD HIVE_MAX_STRING_LENGTH to specify the length of the 
HIVE string and map to Trafodion VARCHAR(n), which n is in bytes and defined by 
that CQD.
Since HIVE VARCHAR is multi-byte, we have to map it into Trafodion VARCHAR(m), 
which m is in CHARACTER. So this will be confusing to the end user.
Dave’s proposal is to rename the HIVE_MAX_STRING_LENGTH to 
HIVE_MAX_STRING_LENGTH_IN_BYTES, which will be clear to the end user.
Since that CQD is already widely used, so Hans suggested if we can just 
deprecate that.

We want to hear more suggestions from the community.

Thanks,
Ming

发件人: Liu, Ming (Ming) [mailto:[email protected]]
发送时间: 2016年5月26日 23:53
收件人: [email protected]
主题: 答复: Hive STRING and VARCHAR types

Agree with QiFan.
I think we should map closest data type from Hive to Trafodion. Current mapping 
of hive STRING to Trafodion VARCHAR may not be proper. Since Hive can save up 
to 2G in a String column, but Trafodion VARCHAR has much smaller max size. So 
if there is a 2G hive string data, how can we convert it into VARCHAR? We can 
implicitly truncate like Impala, but that not seems good.
But why I cannot find an official Hive manual that describes the max length of 
STRING?

This seems a big semantic change, if map Hive STRING into Trafodion CLOB, and 
there will be no confusing anymore, since then STRING and VARCHAR are two very 
different types. VARCHAR(n) will be treated as n Characters.

Thanks,
Ming


发件人: Dave Birdsall [mailto:[email protected]]
发送时间: 2016年5月26日 23:15
收件人: 
[email protected]<mailto:[email protected]>
主题: RE: Hive STRING and VARCHAR types

But CLOB would limit what predicates and functions one could use on the column, 
right?

From: Qifan Chen [mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, May 26, 2016 5:43 AM
To: 
[email protected]<mailto:[email protected]>
Subject: Re: Hive STRING and VARCHAR types

I wonder if we should consider the same length limit as Hive for a STRING type, 
which is 2GB (http://www.folkstalk.com/2011/11/data-types-in-hive.html). If so, 
the Trafodion mapping should be CLOB?

--Qifan

On Wed, May 25, 2016 at 11:49 PM, Selva Govindarajan 
<[email protected]<mailto:[email protected]>> wrote:
From the Cloudera documentation

Text table considerations:

Text data files can contain values that are longer than allowed by the 
VARCHAR(n) length limit. Any extra trailing characters are ignored when Impala 
processes those values during a query

Will Trafodion behave the same way? Having the maximum limit for the individual 
column provides the flexibility and optimal resource utilization. However, 
having the limit in number of bytes for String and number of characters for 
Varchar could be quite confusing for the user.

Selva


From: Hans Zeller [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, May 25, 2016 6:12 PM
To: 
[email protected]<mailto:[email protected]>
Subject: Hive STRING and VARCHAR types

Hi,

Here is a question on Hive data types. Ming is about to add support for Hive 
VARCHAR data types, in addition to the existing STRING type, but we hit a 
question we wanted to pose to the user community. Here is a proposed type 
mapping from Hive to Trafodion:

Hive type

Trafodion type

Max # of chars

Size in bytes

Existing/new

Comments

STRING

VARCHAR(n BYTES)

n/4 to n

n

existing

n is determined by the HIVE_MAX_STRING_LENGTH CQD

VARCHAR(m)

VARCHAR(m)

m

4*m

proposed


Is it ok if we treat STRING and VARCHAR differently?

Thanks,

Ming and Hans



--
Regards, --Qifan

Reply via email to