alamb opened a new issue #171:
URL: https://github.com/apache/arrow-rs/issues/171


   *Note*: migrated from original JIRA: 
https://issues.apache.org/jira/browse/ARROW-11410
   
   Currently the Rust parquet reader returns a regular array (e.g. string 
array) even when the column is dictionary encoded in the parquet file.
   
   If the parquet reader had the ability to return dictionary arrays for 
dictionary encoded columns this would bring many benefits such as:
    * faster reading of dictionary encoded columns from parquet (as no 
conversion/expansion into a regular array would be necessary)
    * more efficient memory use as the dictionary array would use less memory 
when loaded in memory
    * faster filtering operations as SIMD can be used to filter over the 
numeric keys of a dictionary string array instead of comparing string values in 
a string array
   
   [~nevime] , [~alamb]  let me know what you think


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to