Re: Performance of VectorizedRleValuesReader

2020-09-14 Thread Chang Chen
I See.

In our case,  we use SingleBufferInputStream, so time spent is duplicating
the backing byte buffer.


Thanks
Chang


Ryan Blue  于2020年9月15日周二 上午2:04写道:

> Before, the input was a byte array so we could read from it directly. Now,
> the input is a `ByteBufferInputStream` so that Parquet can choose how to
> allocate buffers. For example, we use vectored reads from S3 that pull back
> multiple buffers in parallel.
>
> Now that the input is a stream based on possibly multiple byte buffers, it
> provides a method to get a buffer of a certain length. In most cases, that
> will create a ByteBuffer with the same backing byte array, but it may need
> to copy if the request spans multiple buffers in the stream. Most of the
> time, the call to `slice` only requires duplicating the buffer and setting
> its limit, but a read that spans multiple buffers is expensive. It would be
> helpful to know whether the time spent is copying data, which would
> indicate the backing buffers are too small, or whether it is spent
> duplicating the backing byte buffer.
>
> On Mon, Sep 14, 2020 at 5:29 AM Sean Owen  wrote:
>
>> Ryan do you happen to have any opinion there? that particular section
>> was introduced in the Parquet 1.10 update:
>>
>> https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20
>> It looks like it didn't use to make a ByteBuffer each time, but read from
>> in.
>>
>> On Sun, Sep 13, 2020 at 10:48 PM Chang Chen  wrote:
>> >
>> > I think we can copy all encoded data into a ByteBuffer once, and unpack
>> values in the loop
>> >
>> >  while (valueIndex < this.currentCount) {
>> > // values are bit packed 8 at a time, so reading bitWidth will
>> always work
>> > this.packer.unpack8Values(buffer, buffer.position() + valueIndex,
>> this.currentBuffer, valueIndex);
>> > valueIndex += 8;
>> >   }
>> >
>> > Sean Owen  于2020年9月14日周一 上午10:40写道:
>> >>
>> >> It certainly can't be called once - it's reading different data each
>> time.
>> >> There might be a faster way to do it, I don't know. Do you have ideas?
>> >>
>> >> On Sun, Sep 13, 2020 at 9:25 PM Chang Chen 
>> wrote:
>> >> >
>> >> > Hi export
>> >> >
>> >> > it looks like there is a hot spot in
>> VectorizedRleValuesReader#readNextGroup()
>> >> >
>> >> > case PACKED:
>> >> >   int numGroups = header >>> 1;
>> >> >   this.currentCount = numGroups * 8;
>> >> >
>> >> >   if (this.currentBuffer.length < this.currentCount) {
>> >> > this.currentBuffer = new int[this.currentCount];
>> >> >   }
>> >> >   currentBufferIdx = 0;
>> >> >   int valueIndex = 0;
>> >> >   while (valueIndex < this.currentCount) {
>> >> > // values are bit packed 8 at a time, so reading bitWidth will
>> always work
>> >> > ByteBuffer buffer = in.slice(bitWidth);
>> >> > this.packer.unpack8Values(buffer, buffer.position(),
>> this.currentBuffer, valueIndex);
>> >> > valueIndex += 8;
>> >> >   }
>> >> >
>> >> >
>> >> > Per my profile, the codes will spend 30% time of readNextGrou() on
>> slice , why we can't call slice out of the loop?
>>
>
>
> --
> Ryan Blue
>


Re: Performance of VectorizedRleValuesReader

2020-09-14 Thread Ryan Blue
Before, the input was a byte array so we could read from it directly. Now,
the input is a `ByteBufferInputStream` so that Parquet can choose how to
allocate buffers. For example, we use vectored reads from S3 that pull back
multiple buffers in parallel.

Now that the input is a stream based on possibly multiple byte buffers, it
provides a method to get a buffer of a certain length. In most cases, that
will create a ByteBuffer with the same backing byte array, but it may need
to copy if the request spans multiple buffers in the stream. Most of the
time, the call to `slice` only requires duplicating the buffer and setting
its limit, but a read that spans multiple buffers is expensive. It would be
helpful to know whether the time spent is copying data, which would
indicate the backing buffers are too small, or whether it is spent
duplicating the backing byte buffer.

On Mon, Sep 14, 2020 at 5:29 AM Sean Owen  wrote:

> Ryan do you happen to have any opinion there? that particular section
> was introduced in the Parquet 1.10 update:
>
> https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20
> It looks like it didn't use to make a ByteBuffer each time, but read from
> in.
>
> On Sun, Sep 13, 2020 at 10:48 PM Chang Chen  wrote:
> >
> > I think we can copy all encoded data into a ByteBuffer once, and unpack
> values in the loop
> >
> >  while (valueIndex < this.currentCount) {
> > // values are bit packed 8 at a time, so reading bitWidth will
> always work
> > this.packer.unpack8Values(buffer, buffer.position() + valueIndex,
> this.currentBuffer, valueIndex);
> > valueIndex += 8;
> >   }
> >
> > Sean Owen  于2020年9月14日周一 上午10:40写道:
> >>
> >> It certainly can't be called once - it's reading different data each
> time.
> >> There might be a faster way to do it, I don't know. Do you have ideas?
> >>
> >> On Sun, Sep 13, 2020 at 9:25 PM Chang Chen 
> wrote:
> >> >
> >> > Hi export
> >> >
> >> > it looks like there is a hot spot in
> VectorizedRleValuesReader#readNextGroup()
> >> >
> >> > case PACKED:
> >> >   int numGroups = header >>> 1;
> >> >   this.currentCount = numGroups * 8;
> >> >
> >> >   if (this.currentBuffer.length < this.currentCount) {
> >> > this.currentBuffer = new int[this.currentCount];
> >> >   }
> >> >   currentBufferIdx = 0;
> >> >   int valueIndex = 0;
> >> >   while (valueIndex < this.currentCount) {
> >> > // values are bit packed 8 at a time, so reading bitWidth will
> always work
> >> > ByteBuffer buffer = in.slice(bitWidth);
> >> > this.packer.unpack8Values(buffer, buffer.position(),
> this.currentBuffer, valueIndex);
> >> > valueIndex += 8;
> >> >   }
> >> >
> >> >
> >> > Per my profile, the codes will spend 30% time of readNextGrou() on
> slice , why we can't call slice out of the loop?
>


-- 
Ryan Blue


Re: Performance of VectorizedRleValuesReader

2020-09-14 Thread Sean Owen
Ryan do you happen to have any opinion there? that particular section
was introduced in the Parquet 1.10 update:
https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20
It looks like it didn't use to make a ByteBuffer each time, but read from in.

On Sun, Sep 13, 2020 at 10:48 PM Chang Chen  wrote:
>
> I think we can copy all encoded data into a ByteBuffer once, and unpack 
> values in the loop
>
>  while (valueIndex < this.currentCount) {
> // values are bit packed 8 at a time, so reading bitWidth will always work
> this.packer.unpack8Values(buffer, buffer.position() + valueIndex, 
> this.currentBuffer, valueIndex);
> valueIndex += 8;
>   }
>
> Sean Owen  于2020年9月14日周一 上午10:40写道:
>>
>> It certainly can't be called once - it's reading different data each time.
>> There might be a faster way to do it, I don't know. Do you have ideas?
>>
>> On Sun, Sep 13, 2020 at 9:25 PM Chang Chen  wrote:
>> >
>> > Hi export
>> >
>> > it looks like there is a hot spot in 
>> > VectorizedRleValuesReader#readNextGroup()
>> >
>> > case PACKED:
>> >   int numGroups = header >>> 1;
>> >   this.currentCount = numGroups * 8;
>> >
>> >   if (this.currentBuffer.length < this.currentCount) {
>> > this.currentBuffer = new int[this.currentCount];
>> >   }
>> >   currentBufferIdx = 0;
>> >   int valueIndex = 0;
>> >   while (valueIndex < this.currentCount) {
>> > // values are bit packed 8 at a time, so reading bitWidth will always 
>> > work
>> > ByteBuffer buffer = in.slice(bitWidth);
>> > this.packer.unpack8Values(buffer, buffer.position(), 
>> > this.currentBuffer, valueIndex);
>> > valueIndex += 8;
>> >   }
>> >
>> >
>> > Per my profile, the codes will spend 30% time of readNextGrou() on slice , 
>> > why we can't call slice out of the loop?

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Performance of VectorizedRleValuesReader

2020-09-13 Thread Chang Chen
I think we can copy all encoded data into a ByteBuffer once, and unpack
values in the loop

 while (valueIndex < this.currentCount) {
// values are bit packed 8 at a time, so reading bitWidth will always
work
this.packer.unpack8Values(buffer, buffer.position() + valueIndex,
this.currentBuffer, valueIndex);
valueIndex += 8;
  }

Sean Owen  于2020年9月14日周一 上午10:40写道:

> It certainly can't be called once - it's reading different data each time.
> There might be a faster way to do it, I don't know. Do you have ideas?
>
> On Sun, Sep 13, 2020 at 9:25 PM Chang Chen  wrote:
> >
> > Hi export
> >
> > it looks like there is a hot spot in
> VectorizedRleValuesReader#readNextGroup()
> >
> > case PACKED:
> >   int numGroups = header >>> 1;
> >   this.currentCount = numGroups * 8;
> >
> >   if (this.currentBuffer.length < this.currentCount) {
> > this.currentBuffer = new int[this.currentCount];
> >   }
> >   currentBufferIdx = 0;
> >   int valueIndex = 0;
> >   while (valueIndex < this.currentCount) {
> > // values are bit packed 8 at a time, so reading bitWidth will
> always work
> > ByteBuffer buffer = in.slice(bitWidth);
> > this.packer.unpack8Values(buffer, buffer.position(),
> this.currentBuffer, valueIndex);
> > valueIndex += 8;
> >   }
> >
> >
> > Per my profile, the codes will spend 30% time of readNextGrou() on slice
> , why we can't call slice out of the loop?
>


Re: Performance of VectorizedRleValuesReader

2020-09-13 Thread Sean Owen
It certainly can't be called once - it's reading different data each time.
There might be a faster way to do it, I don't know. Do you have ideas?

On Sun, Sep 13, 2020 at 9:25 PM Chang Chen  wrote:
>
> Hi export
>
> it looks like there is a hot spot in VectorizedRleValuesReader#readNextGroup()
>
> case PACKED:
>   int numGroups = header >>> 1;
>   this.currentCount = numGroups * 8;
>
>   if (this.currentBuffer.length < this.currentCount) {
> this.currentBuffer = new int[this.currentCount];
>   }
>   currentBufferIdx = 0;
>   int valueIndex = 0;
>   while (valueIndex < this.currentCount) {
> // values are bit packed 8 at a time, so reading bitWidth will always work
> ByteBuffer buffer = in.slice(bitWidth);
> this.packer.unpack8Values(buffer, buffer.position(), this.currentBuffer, 
> valueIndex);
> valueIndex += 8;
>   }
>
>
> Per my profile, the codes will spend 30% time of readNextGrou() on slice , 
> why we can't call slice out of the loop?

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Performance of VectorizedRleValuesReader

2020-09-13 Thread Chang Chen
Hi export

it looks like there is a hot spot in VectorizedRleValuesReader#readNextGroup
()

case PACKED:
  int numGroups = header >>> 1;
  this.currentCount = numGroups * 8;

  if (this.currentBuffer.length < this.currentCount) {
this.currentBuffer = new int[this.currentCount];
  }
  currentBufferIdx = 0;
  int valueIndex = 0;
  while (valueIndex < this.currentCount) {
// values are bit packed 8 at a time, so reading bitWidth will always work
ByteBuffer buffer = in.slice(bitWidth);
this.packer.unpack8Values(buffer, buffer.position(),
this.currentBuffer, valueIndex);
valueIndex += 8;
  }


Per my profile, the codes will spend 30% time of readNextGrou() on slice ,
why we can't call slice out of the loop?