Re: [petsc-dev] [Radev, Martin] Re: Adding a new encoding for FP data

Smith, Barry F. via petsc-dev Thu, 11 Jul 2019 08:58:03 -0700


  Sorry, I wasn't clear. Just meant something simpler. Compress the matrix to 
copy it to the GPU for faster transfers (and uncompress it appropriately on the 
GPU).


   Barry


> On Jul 11, 2019, at 10:49 AM, Jed Brown <[email protected]> wrote:
> 
> I don't know anything about zstd (or competitive compression) for GPU,
> but doubt it works at the desired granularity.  I think SpMV on late-gen
> CPUs can be accelerated by zstd column index compression, especially for
> semi-structured problems, but likely also for unstructured problems
> numbered by breadth-first search or similar.  But we'd need to demo that
> use specifically.
> 
> "Smith, Barry F." <[email protected]> writes:
> 
>>  CPU to GPU? Especially matrices?
>> 
>>> On Jul 11, 2019, at 9:05 AM, Jed Brown via petsc-dev 
>>> <[email protected]> wrote:
>>> 
>>> Zstd is a remarkably good compressor.  I've experimented with it for
>>> compressing column indices for sparse matrices on structured grids and
>>> (after a simple transform: subtracting the row number) gotten
>>> decompression speed in the neighborhood of 10 GB/s (i.e., faster per
>>> core than DRAM).  I've been meaning to follow up.  The transformation
>>> described below (splitting the bytes) is yielding decompression speed
>>> around 1GB/s (in this link below), which isn't competitive for things
>>> like MatMult, but could be useful for things like trajectory
>>> checkpointing.
>>> 
>>> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>>> 
>>> 
>>> From: "Radev, Martin" <[email protected]>
>>> Subject: Re: Adding a new encoding for FP data
>>> Date: July 11, 2019 at 4:55:03 AM CDT
>>> To: "[email protected]" <[email protected]>
>>> Cc: "Raoofy, Amir" <[email protected]>, "Karlstetter, Roman" 
>>> <[email protected]>
>>> Reply-To: <[email protected]>
>>> 
>>> 
>>> Hello Liya Fan,
>>> 
>>> 
>>> this explains the technique but for a more complex case:
>>> 
>>> https://fgiesen.wordpress.com/2011/01/24/x86-code-compression-in-kkrunchy/
>>> 
>>> For FP data, the approach which seemed to be the best is the following.
>>> 
>>> Say we have a buffer of two 32-bit floating point values:
>>> 
>>> buf = [af, bf]
>>> 
>>> We interpret each FP value as a 32-bit uint and look at each individual 
>>> byte. We have 8 bytes in total for this small input.
>>> 
>>> buf = [af0, af1, af2, af3, bf0, bf1, bf2, bf3]
>>> 
>>> Then we apply stream splitting and the new buffer becomes:
>>> 
>>> newbuf = [af0, bf0, af1, bf1, af2, bf2, af3, bf3]
>>> 
>>> We compress newbuf.
>>> 
>>> Due to similarities the sign bits, mantissa bits and MSB exponent bits, we 
>>> might have a lot more repetitions in data. For scientific data, the 2nd and 
>>> 3rd byte for 32-bit data is probably largely noise. Thus in the original 
>>> representation we would always have a few bytes of data which could appear 
>>> somewhere else in the buffer and then a couple bytes of possible noise. In 
>>> the new representation we have a long stream of data which could compress 
>>> well and then a sequence of noise towards the end.
>>> 
>>> This transformation improved compression ratio as can be seen in the report.
>>> 
>>> It also improved speed for ZSTD. This could be because ZSTD makes a 
>>> decision of how to compress the data - RLE, new huffman tree, huffman tree 
>>> of the previous frame, raw representation. Each can potentially achieve a 
>>> different compression ratio and compression/decompression speed. It turned 
>>> out that when the transformation is applied, zstd would attempt to compress 
>>> fewer frames and copy the other. This could lead to less attempts to build 
>>> a huffman tree. It's hard to pin-point the exact reason.
>>> 
>>> I did not try other lossless text compressors but I expect similar results.
>>> 
>>> For code, I can polish my patches, create a Jira task and submit the 
>>> patches for review.
>>> 
>>> 
>>> Regards,
>>> 
>>> Martin
>>> 
>>> 
>>> ________________________________
>>> From: Fan Liya <[email protected]>
>>> Sent: Thursday, July 11, 2019 11:32:53 AM
>>> To: [email protected]
>>> Cc: Raoofy, Amir; Karlstetter, Roman
>>> Subject: Re: Adding a new encoding for FP data
>>> 
>>> Hi Radev,
>>> 
>>> Thanks for the information. It seems interesting.
>>> IMO, Arrow has much to do for data compression. However, it seems there are
>>> some differences for memory data compression and external storage data
>>> compression.
>>> 
>>> Could you please provide some reference for stream splitting?
>>> 
>>> Best,
>>> Liya Fan
>>> 
>>> On Thu, Jul 11, 2019 at 5:15 PM Radev, Martin <[email protected]> wrote:
>>> 
>>>> Hello people,
>>>> 
>>>> 
>>>> there has been discussion in the Apache Parquet mailing list on adding a
>>>> new encoder for FP data.
>>>> The reason for this is that the supported compressors by Apache Parquet
>>>> (zstd, gzip, etc) do not compress well raw FP data.
>>>> 
>>>> 
>>>> In my investigation it turns out that a very simple simple technique,
>>>> named stream splitting, can improve the compression ratio and even speed
>>>> for some of the compressors.
>>>> 
>>>> You can read about the results here:
>>>> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>>>> 
>>>> 
>>>> I went through the developer guide for Apache Arrow and wrote a patch to
>>>> add the new encoding and test coverage for it.
>>>> 
>>>> I will polish my patch and work in parallel to extend the Apache Parquet
>>>> format for the new encoding.
>>>> 
>>>> 
>>>> If you have any concerns, please let me know.
>>>> 
>>>> 
>>>> Regards,
>>>> 
>>>> Martin
>>>> 
>>>> 
>>> 
>>>

Re: [petsc-dev] [Radev, Martin] Re: Adding a new encoding for FP data

Reply via email to