Liya Fan created ARROW-6184: ------------------------------- Summary: [Java] Provide hash table based dictionary encoder Key: ARROW-6184 URL: https://issues.apache.org/jira/browse/ARROW-6184 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Liya Fan Assignee: Liya Fan
This is the second part of ARROW-5917. We provide a sort based encoder, as well as a hash table based encoder, to solve the problem with the current dictionary encoder. In particular, we solve the following problems with the current encoder: # There are repeated conversions between Java objects and bytes (e.g. vector.getObject(i)). # Unnecessary memory copy (the vector data must be copied to the hash table). # The hash table cannot be reused for encoding multiple vectors (other data structure & results cannot be reused either). # The output vector should not be created/managed by the encoder (just like in the out-of-place sorter) # The hash table requires that the hashCode & equals methods be implemented appropriately, but this is not guaranteed. -- This message was sent by Atlassian JIRA (v7.6.14#76016)