[jira] [Commented] (PARQUET-671) Improve performance of RLE/bit-packed decoding in parquet-cpp
[ https://issues.apache.org/jira/browse/PARQUET-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402877#comment-15402877 ] Julien Le Dem commented on PARQUET-671: --- Thanks [~edaniel] Looking forward to see your PR > Improve performance of RLE/bit-packed decoding in parquet-cpp > - > > Key: PARQUET-671 > URL: https://issues.apache.org/jira/browse/PARQUET-671 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Eric Daniel >Assignee: Eric Daniel > > There are steps that can dramatically improve decoding performance: > - when decoding repeated values in the rle/dictionary encoding, do the > dictionary lookup only once > - when decoding bit-packed sequences, do the decoding in batches so the bit > unpacker's state can be kept in registers (instead of updating members for > every decoded value) > - use Daniel Lemire's fast unpacking routines whenever possible > (https://github.com/lemire/FrameOfReference/) > I have a PR ready to implement these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-671) Improve performance of RLE/bit-packed decoding in parquet-cpp
[ https://issues.apache.org/jira/browse/PARQUET-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated PARQUET-671: - Assignee: Eric Daniel > Improve performance of RLE/bit-packed decoding in parquet-cpp > - > > Key: PARQUET-671 > URL: https://issues.apache.org/jira/browse/PARQUET-671 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Eric Daniel >Assignee: Eric Daniel > > There are steps that can dramatically improve decoding performance: > - when decoding repeated values in the rle/dictionary encoding, do the > dictionary lookup only once > - when decoding bit-packed sequences, do the decoding in batches so the bit > unpacker's state can be kept in registers (instead of updating members for > every decoded value) > - use Daniel Lemire's fast unpacking routines whenever possible > (https://github.com/lemire/FrameOfReference/) > I have a PR ready to implement these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-671) Improve performance of RLE/bit-packed decoding in parquet-cpp
Eric Daniel created PARQUET-671: --- Summary: Improve performance of RLE/bit-packed decoding in parquet-cpp Key: PARQUET-671 URL: https://issues.apache.org/jira/browse/PARQUET-671 Project: Parquet Issue Type: Improvement Components: parquet-cpp Reporter: Eric Daniel There are steps that can dramatically improve decoding performance: - when decoding repeated values in the rle/dictionary encoding, do the dictionary lookup only once - when decoding bit-packed sequences, do the decoding in batches so the bit unpacker's state can be kept in registers (instead of updating members for every decoded value) - use Daniel Lemire's fast unpacking routines whenever possible (https://github.com/lemire/FrameOfReference/) I have a PR ready to implement these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)