I think we can copy all encoded data into a ByteBuffer once, and unpack
values in the loop
while (valueIndex < this.currentCount) {
// values are bit packed 8 at a time, so reading bitWidth will always
work
this.packer.unpack8Values(buffer, buffer.position() + valueIndex,
this.currentBuff
Hi,
I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based
shuffle to improve shuffle efficiency.
Please take a look at:
- SPIP jira: https://issues.apache.org/jira/browse/SPARK-30602
- SPIP doc:
https://docs.google.com/document/d/1mYzKVZllA5Flw8AtoX7JUcXBOnNIDADWRbJ7GI6Y
It certainly can't be called once - it's reading different data each time.
There might be a faster way to do it, I don't know. Do you have ideas?
On Sun, Sep 13, 2020 at 9:25 PM Chang Chen wrote:
>
> Hi export
>
> it looks like there is a hot spot in VectorizedRleValuesReader#readNextGroup()
>
>
Hi export
it looks like there is a hot spot in VectorizedRleValuesReader#readNextGroup
()
case PACKED:
int numGroups = header >>> 1;
this.currentCount = numGroups * 8;
if (this.currentBuffer.length < this.currentCount) {
this.currentBuffer = new int[this.currentCount];
}
currentBuf