Hi, all, Dave just propose a good idea to solve the incompatible issue of STRING and VARCHAR mapping differently.
Trafodion used to use CQD HIVE_MAX_STRING_LENGTH to specify the length of the HIVE string and map to Trafodion VARCHAR(n), which n is in bytes and defined by that CQD. Since HIVE VARCHAR is multi-byte, we have to map it into Trafodion VARCHAR(m), which m is in CHARACTER. So this will be confusing to the end user. Dave’s proposal is to rename the HIVE_MAX_STRING_LENGTH to HIVE_MAX_STRING_LENGTH_IN_BYTES, which will be clear to the end user. Since that CQD is already widely used, so Hans suggested if we can just deprecate that. We want to hear more suggestions from the community. Thanks, Ming 发件人: Liu, Ming (Ming) [mailto:[email protected]] 发送时间: 2016年5月26日 23:53 收件人: [email protected] 主题: 答复: Hive STRING and VARCHAR types Agree with QiFan. I think we should map closest data type from Hive to Trafodion. Current mapping of hive STRING to Trafodion VARCHAR may not be proper. Since Hive can save up to 2G in a String column, but Trafodion VARCHAR has much smaller max size. So if there is a 2G hive string data, how can we convert it into VARCHAR? We can implicitly truncate like Impala, but that not seems good. But why I cannot find an official Hive manual that describes the max length of STRING? This seems a big semantic change, if map Hive STRING into Trafodion CLOB, and there will be no confusing anymore, since then STRING and VARCHAR are two very different types. VARCHAR(n) will be treated as n Characters. Thanks, Ming 发件人: Dave Birdsall [mailto:[email protected]] 发送时间: 2016年5月26日 23:15 收件人: [email protected]<mailto:[email protected]> 主题: RE: Hive STRING and VARCHAR types But CLOB would limit what predicates and functions one could use on the column, right? From: Qifan Chen [mailto:[email protected]<mailto:[email protected]>] Sent: Thursday, May 26, 2016 5:43 AM To: [email protected]<mailto:[email protected]> Subject: Re: Hive STRING and VARCHAR types I wonder if we should consider the same length limit as Hive for a STRING type, which is 2GB (http://www.folkstalk.com/2011/11/data-types-in-hive.html). If so, the Trafodion mapping should be CLOB? --Qifan On Wed, May 25, 2016 at 11:49 PM, Selva Govindarajan <[email protected]<mailto:[email protected]>> wrote: From the Cloudera documentation Text table considerations: Text data files can contain values that are longer than allowed by the VARCHAR(n) length limit. Any extra trailing characters are ignored when Impala processes those values during a query Will Trafodion behave the same way? Having the maximum limit for the individual column provides the flexibility and optimal resource utilization. However, having the limit in number of bytes for String and number of characters for Varchar could be quite confusing for the user. Selva From: Hans Zeller [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, May 25, 2016 6:12 PM To: [email protected]<mailto:[email protected]> Subject: Hive STRING and VARCHAR types Hi, Here is a question on Hive data types. Ming is about to add support for Hive VARCHAR data types, in addition to the existing STRING type, but we hit a question we wanted to pose to the user community. Here is a proposed type mapping from Hive to Trafodion: Hive type Trafodion type Max # of chars Size in bytes Existing/new Comments STRING VARCHAR(n BYTES) n/4 to n n existing n is determined by the HIVE_MAX_STRING_LENGTH CQD VARCHAR(m) VARCHAR(m) m 4*m proposed Is it ok if we treat STRING and VARCHAR differently? Thanks, Ming and Hans -- Regards, --Qifan
