[GitHub] [parquet-mr] theosib-amazon commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2022-05-25 Thread GitBox
theosib-amazon commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1137610598 That batch reader in Presto reminds me of some of the experimental changes I made in Trino. I modified PrimitiveColumnReader to work out how many of each data item it needs to

[GitHub] [parquet-mr] theosib-amazon commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2022-05-24 Thread GitBox
theosib-amazon commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1136528207 > > Is byte (and arrays and buffers of bytes) the only datatype you support? My PR is optimizing code paths that pull ints, longs, and other sizes out of the data buffers. Are

[GitHub] [parquet-mr] theosib-amazon commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2022-05-24 Thread GitBox
theosib-amazon commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1136526711 This is interesting, because when I did profiling of Trino, I found that although I/O (from S3, over the network no less) was significant, even more time was spent in compute.

[GitHub] [parquet-mr] theosib-amazon commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2022-05-18 Thread GitBox
theosib-amazon commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1130275378 @parthchandra One thing that confuses me a bit is that these buffers have only ByteBuffer inside them. There's no actual I/O, so it's not possible to block. Do you have

[GitHub] [parquet-mr] theosib-amazon commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2022-05-18 Thread GitBox
theosib-amazon commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1130176799 @parthchandra Would you mind having a look at my I/O performance optimization plan for ParquetMR? I think we should coordinate, since we have some ideas that might overlap what