subject:"Performance of VectorizedRleValuesReader"

Re: Performance of VectorizedRleValuesReader

2020-09-14 Thread Chang Chen

I See. In our case, we use SingleBufferInputStream, so time spent is duplicating the backing byte buffer. Thanks Chang Ryan Blue 于2020年9月15日周二上午2:04写道： > Before, the input was a byte array so we could read from it directly. Now, > the input is a `ByteBufferInputStream` so that Parquet can

Re: Performance of VectorizedRleValuesReader

2020-09-14 Thread Ryan Blue

Before, the input was a byte array so we could read from it directly. Now, the input is a `ByteBufferInputStream` so that Parquet can choose how to allocate buffers. For example, we use vectored reads from S3 that pull back multiple buffers in parallel. Now that the input is a stream based on poss

Re: Performance of VectorizedRleValuesReader

2020-09-14 Thread Sean Owen

Ryan do you happen to have any opinion there? that particular section was introduced in the Parquet 1.10 update: https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20 It looks like it didn't use to make a ByteBuffer each time, but read from in. On Sun, Sep 13, 2020 at 10:

Re: Performance of VectorizedRleValuesReader

2020-09-13 Thread Chang Chen

I think we can copy all encoded data into a ByteBuffer once, and unpack values in the loop while (valueIndex < this.currentCount) { // values are bit packed 8 at a time, so reading bitWidth will always work this.packer.unpack8Values(buffer, buffer.position() + valueIndex, this.currentBuff

Re: Performance of VectorizedRleValuesReader

2020-09-13 Thread Sean Owen

It certainly can't be called once - it's reading different data each time. There might be a faster way to do it, I don't know. Do you have ideas? On Sun, Sep 13, 2020 at 9:25 PM Chang Chen wrote: > > Hi export > > it looks like there is a hot spot in VectorizedRleValuesReader#readNextGroup() > >

Performance of VectorizedRleValuesReader

2020-09-13 Thread Chang Chen

Hi export it looks like there is a hot spot in VectorizedRleValuesReader#readNextGroup () case PACKED: int numGroups = header >>> 1; this.currentCount = numGroups * 8; if (this.currentBuffer.length < this.currentCount) { this.currentBuffer = new int[this.currentCount]; } currentBuf

Re: Performance of VectorizedRleValuesReader

Re: Performance of VectorizedRleValuesReader

Re: Performance of VectorizedRleValuesReader

Re: Performance of VectorizedRleValuesReader

Re: Performance of VectorizedRleValuesReader

Performance of VectorizedRleValuesReader

6 matches

Site Navigation

Mail list logo

Footer information