theosib-amazon commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1137610598
That batch reader in Presto reminds me of some of the experimental changes I
made in Trino. I modified PrimitiveColumnReader to work out how many of each
data item it needs to
theosib-amazon commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1136528207
> > Is byte (and arrays and buffers of bytes) the only datatype you support?
My PR is optimizing code paths that pull ints, longs, and other sizes out of
the data buffers. Are
theosib-amazon commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1136526711
This is interesting, because when I did profiling of Trino, I found that
although I/O (from S3, over the network no less) was significant, even more
time was spent in compute.
theosib-amazon commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1130275378
@parthchandra One thing that confuses me a bit is that these buffers have
only ByteBuffer inside them. There's no actual I/O, so it's not possible to
block. Do you have
theosib-amazon commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1130176799
@parthchandra Would you mind having a look at my I/O performance
optimization plan for ParquetMR? I think we should coordinate, since we have
some ideas that might overlap what