[ https://issues.apache.org/jira/browse/HIVE-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HIVE-4121: -------------------------------- Description: Currently string columns always have dictionaries and numerics are always directly encoded. It would be better to make the encoding depend on a sample of the data. Perhaps the first 100k values should be evaluated for repeated values and the encoding picked for the stripe. > ORC should have optional dictionaries for both strings and numeric types > ------------------------------------------------------------------------ > > Key: HIVE-4121 > URL: https://issues.apache.org/jira/browse/HIVE-4121 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers > Reporter: Owen O'Malley > Assignee: Owen O'Malley > > Currently string columns always have dictionaries and numerics are always > directly encoded. It would be better to make the encoding depend on a sample > of the data. Perhaps the first 100k values should be evaluated for repeated > values and the encoding picked for the stripe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira