[ 
https://issues.apache.org/jira/browse/HIVE-16889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404794#comment-16404794
 ] 

BELUGA BEHR commented on HIVE-16889:
------------------------------------

Would it be possible to create a HS2 configuration that would disable these 
checks, even if they are defined in the schema?   This would allow third party 
applications to continue use the VARCHAR data type, but it would not have this 
overheard.

> Improve Performance Of VARCHAR
> ------------------------------
>
>                 Key: HIVE-16889
>                 URL: https://issues.apache.org/jira/browse/HIVE-16889
>             Project: Hive
>          Issue Type: Improvement
>          Components: Types
>    Affects Versions: 2.1.1, 3.0.0
>            Reporter: BELUGA BEHR
>            Assignee: Janaki Lahorani
>            Priority: Major
>
> Often times, organizations use tools that create table schemas on the fly and 
> they specify a  VARCHAR column with precision.  In these scenarios, 
> performance suffers even though one could assume performance should be better 
> since there is pre-existing knowledge about the size of the data and buffers 
> could be more efficiently setup then in the case where no such knowledge 
> exists.
> Most of the performance seems to be caused by reading a STRING from a file 
> into a byte buffer, checking the length of the STRING, truncating the STRING 
> if needed, and then serializing the STRING back into bytes again.
> From the code, I have identified several areas where develops left notes 
> about later improvements.
> # org.apache.hadoop.hive.serde2.io.HiveVarcharWritable.enforceMaxLength(int)
> # org.apache.hadoop.hive.serde2.lazy.LazyHiveVarchar.init(ByteArrayRef, int, 
> int)
> # 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveVarchar(Object,
>  PrimitiveObjectInspector)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to