[GitHub] [parquet-mr] theosib-amazon commented on pull request #953: Performance optimizations: Merged all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-03-29 Thread GitBox
theosib-amazon commented on pull request #953: URL: https://github.com/apache/parquet-mr/pull/953#issuecomment-1081948012 I forgot to add this to a comment in the code: The reason PlainValuesReader still includes an unused LittleEndianDataInputStream member is because if I don't, the bui

[GitHub] [parquet-mr] theosib-amazon opened a new pull request #953: Performance optimizations: Merged all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-03-29 Thread GitBox
theosib-amazon opened a new pull request #953: URL: https://github.com/apache/parquet-mr/pull/953 This PR is all performance optimization. In benchmarking with Trino, we find query performance to improve from 5% to 15%, depending on the query, and that includes all the I/O time from S3.