Re: Performance of VectorizedRleValuesReader
I See. In our case, we use SingleBufferInputStream, so time spent is duplicating the backing byte buffer. Thanks Chang Ryan Blue 于2020年9月15日周二 上午2:04写道: > Before, the input was a byte array so we could read from it directly. Now, > the input is a `ByteBufferInputStream` so that Parquet can choose how to > allocate buffers. For example, we use vectored reads from S3 that pull back > multiple buffers in parallel. > > Now that the input is a stream based on possibly multiple byte buffers, it > provides a method to get a buffer of a certain length. In most cases, that > will create a ByteBuffer with the same backing byte array, but it may need > to copy if the request spans multiple buffers in the stream. Most of the > time, the call to `slice` only requires duplicating the buffer and setting > its limit, but a read that spans multiple buffers is expensive. It would be > helpful to know whether the time spent is copying data, which would > indicate the backing buffers are too small, or whether it is spent > duplicating the backing byte buffer. > > On Mon, Sep 14, 2020 at 5:29 AM Sean Owen wrote: > >> Ryan do you happen to have any opinion there? that particular section >> was introduced in the Parquet 1.10 update: >> >> https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20 >> It looks like it didn't use to make a ByteBuffer each time, but read from >> in. >> >> On Sun, Sep 13, 2020 at 10:48 PM Chang Chen wrote: >> > >> > I think we can copy all encoded data into a ByteBuffer once, and unpack >> values in the loop >> > >> > while (valueIndex < this.currentCount) { >> > // values are bit packed 8 at a time, so reading bitWidth will >> always work >> > this.packer.unpack8Values(buffer, buffer.position() + valueIndex, >> this.currentBuffer, valueIndex); >> > valueIndex += 8; >> > } >> > >> > Sean Owen 于2020年9月14日周一 上午10:40写道: >> >> >> >> It certainly can't be called once - it's reading different data each >> time. >> >> There might be a faster way to do it, I don't know. Do you have ideas? >> >> >> >> On Sun, Sep 13, 2020 at 9:25 PM Chang Chen >> wrote: >> >> > >> >> > Hi export >> >> > >> >> > it looks like there is a hot spot in >> VectorizedRleValuesReader#readNextGroup() >> >> > >> >> > case PACKED: >> >> > int numGroups = header >>> 1; >> >> > this.currentCount = numGroups * 8; >> >> > >> >> > if (this.currentBuffer.length < this.currentCount) { >> >> > this.currentBuffer = new int[this.currentCount]; >> >> > } >> >> > currentBufferIdx = 0; >> >> > int valueIndex = 0; >> >> > while (valueIndex < this.currentCount) { >> >> > // values are bit packed 8 at a time, so reading bitWidth will >> always work >> >> > ByteBuffer buffer = in.slice(bitWidth); >> >> > this.packer.unpack8Values(buffer, buffer.position(), >> this.currentBuffer, valueIndex); >> >> > valueIndex += 8; >> >> > } >> >> > >> >> > >> >> > Per my profile, the codes will spend 30% time of readNextGrou() on >> slice , why we can't call slice out of the loop? >> > > > -- > Ryan Blue >
Re: Performance of VectorizedRleValuesReader
Before, the input was a byte array so we could read from it directly. Now, the input is a `ByteBufferInputStream` so that Parquet can choose how to allocate buffers. For example, we use vectored reads from S3 that pull back multiple buffers in parallel. Now that the input is a stream based on possibly multiple byte buffers, it provides a method to get a buffer of a certain length. In most cases, that will create a ByteBuffer with the same backing byte array, but it may need to copy if the request spans multiple buffers in the stream. Most of the time, the call to `slice` only requires duplicating the buffer and setting its limit, but a read that spans multiple buffers is expensive. It would be helpful to know whether the time spent is copying data, which would indicate the backing buffers are too small, or whether it is spent duplicating the backing byte buffer. On Mon, Sep 14, 2020 at 5:29 AM Sean Owen wrote: > Ryan do you happen to have any opinion there? that particular section > was introduced in the Parquet 1.10 update: > > https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20 > It looks like it didn't use to make a ByteBuffer each time, but read from > in. > > On Sun, Sep 13, 2020 at 10:48 PM Chang Chen wrote: > > > > I think we can copy all encoded data into a ByteBuffer once, and unpack > values in the loop > > > > while (valueIndex < this.currentCount) { > > // values are bit packed 8 at a time, so reading bitWidth will > always work > > this.packer.unpack8Values(buffer, buffer.position() + valueIndex, > this.currentBuffer, valueIndex); > > valueIndex += 8; > > } > > > > Sean Owen 于2020年9月14日周一 上午10:40写道: > >> > >> It certainly can't be called once - it's reading different data each > time. > >> There might be a faster way to do it, I don't know. Do you have ideas? > >> > >> On Sun, Sep 13, 2020 at 9:25 PM Chang Chen > wrote: > >> > > >> > Hi export > >> > > >> > it looks like there is a hot spot in > VectorizedRleValuesReader#readNextGroup() > >> > > >> > case PACKED: > >> > int numGroups = header >>> 1; > >> > this.currentCount = numGroups * 8; > >> > > >> > if (this.currentBuffer.length < this.currentCount) { > >> > this.currentBuffer = new int[this.currentCount]; > >> > } > >> > currentBufferIdx = 0; > >> > int valueIndex = 0; > >> > while (valueIndex < this.currentCount) { > >> > // values are bit packed 8 at a time, so reading bitWidth will > always work > >> > ByteBuffer buffer = in.slice(bitWidth); > >> > this.packer.unpack8Values(buffer, buffer.position(), > this.currentBuffer, valueIndex); > >> > valueIndex += 8; > >> > } > >> > > >> > > >> > Per my profile, the codes will spend 30% time of readNextGrou() on > slice , why we can't call slice out of the loop? > -- Ryan Blue
Re: Performance of VectorizedRleValuesReader
Ryan do you happen to have any opinion there? that particular section was introduced in the Parquet 1.10 update: https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20 It looks like it didn't use to make a ByteBuffer each time, but read from in. On Sun, Sep 13, 2020 at 10:48 PM Chang Chen wrote: > > I think we can copy all encoded data into a ByteBuffer once, and unpack > values in the loop > > while (valueIndex < this.currentCount) { > // values are bit packed 8 at a time, so reading bitWidth will always work > this.packer.unpack8Values(buffer, buffer.position() + valueIndex, > this.currentBuffer, valueIndex); > valueIndex += 8; > } > > Sean Owen 于2020年9月14日周一 上午10:40写道: >> >> It certainly can't be called once - it's reading different data each time. >> There might be a faster way to do it, I don't know. Do you have ideas? >> >> On Sun, Sep 13, 2020 at 9:25 PM Chang Chen wrote: >> > >> > Hi export >> > >> > it looks like there is a hot spot in >> > VectorizedRleValuesReader#readNextGroup() >> > >> > case PACKED: >> > int numGroups = header >>> 1; >> > this.currentCount = numGroups * 8; >> > >> > if (this.currentBuffer.length < this.currentCount) { >> > this.currentBuffer = new int[this.currentCount]; >> > } >> > currentBufferIdx = 0; >> > int valueIndex = 0; >> > while (valueIndex < this.currentCount) { >> > // values are bit packed 8 at a time, so reading bitWidth will always >> > work >> > ByteBuffer buffer = in.slice(bitWidth); >> > this.packer.unpack8Values(buffer, buffer.position(), >> > this.currentBuffer, valueIndex); >> > valueIndex += 8; >> > } >> > >> > >> > Per my profile, the codes will spend 30% time of readNextGrou() on slice , >> > why we can't call slice out of the loop? - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Performance of VectorizedRleValuesReader
I think we can copy all encoded data into a ByteBuffer once, and unpack values in the loop while (valueIndex < this.currentCount) { // values are bit packed 8 at a time, so reading bitWidth will always work this.packer.unpack8Values(buffer, buffer.position() + valueIndex, this.currentBuffer, valueIndex); valueIndex += 8; } Sean Owen 于2020年9月14日周一 上午10:40写道: > It certainly can't be called once - it's reading different data each time. > There might be a faster way to do it, I don't know. Do you have ideas? > > On Sun, Sep 13, 2020 at 9:25 PM Chang Chen wrote: > > > > Hi export > > > > it looks like there is a hot spot in > VectorizedRleValuesReader#readNextGroup() > > > > case PACKED: > > int numGroups = header >>> 1; > > this.currentCount = numGroups * 8; > > > > if (this.currentBuffer.length < this.currentCount) { > > this.currentBuffer = new int[this.currentCount]; > > } > > currentBufferIdx = 0; > > int valueIndex = 0; > > while (valueIndex < this.currentCount) { > > // values are bit packed 8 at a time, so reading bitWidth will > always work > > ByteBuffer buffer = in.slice(bitWidth); > > this.packer.unpack8Values(buffer, buffer.position(), > this.currentBuffer, valueIndex); > > valueIndex += 8; > > } > > > > > > Per my profile, the codes will spend 30% time of readNextGrou() on slice > , why we can't call slice out of the loop? >
Re: Performance of VectorizedRleValuesReader
It certainly can't be called once - it's reading different data each time. There might be a faster way to do it, I don't know. Do you have ideas? On Sun, Sep 13, 2020 at 9:25 PM Chang Chen wrote: > > Hi export > > it looks like there is a hot spot in VectorizedRleValuesReader#readNextGroup() > > case PACKED: > int numGroups = header >>> 1; > this.currentCount = numGroups * 8; > > if (this.currentBuffer.length < this.currentCount) { > this.currentBuffer = new int[this.currentCount]; > } > currentBufferIdx = 0; > int valueIndex = 0; > while (valueIndex < this.currentCount) { > // values are bit packed 8 at a time, so reading bitWidth will always work > ByteBuffer buffer = in.slice(bitWidth); > this.packer.unpack8Values(buffer, buffer.position(), this.currentBuffer, > valueIndex); > valueIndex += 8; > } > > > Per my profile, the codes will spend 30% time of readNextGrou() on slice , > why we can't call slice out of the loop? - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Performance of VectorizedRleValuesReader
Hi export it looks like there is a hot spot in VectorizedRleValuesReader#readNextGroup () case PACKED: int numGroups = header >>> 1; this.currentCount = numGroups * 8; if (this.currentBuffer.length < this.currentCount) { this.currentBuffer = new int[this.currentCount]; } currentBufferIdx = 0; int valueIndex = 0; while (valueIndex < this.currentCount) { // values are bit packed 8 at a time, so reading bitWidth will always work ByteBuffer buffer = in.slice(bitWidth); this.packer.unpack8Values(buffer, buffer.position(), this.currentBuffer, valueIndex); valueIndex += 8; } Per my profile, the codes will spend 30% time of readNextGrou() on slice , why we can't call slice out of the loop?