Hi hackers! I am a student participating in GSoC 2019. I am looking forward to working with you all and learning from you. My project would aim to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator. For more details, please take a look at my proposal[0]. Any suggestions or comments about my immature ideas would be much appreciated:)
I've implemented the first step of the project, the segment pglz compression provides the ability to get the subset of the raw data without decompressing the entire field. And I've done some test[1] for the compressor. The test result is as follows: NOTICE: Test summary: NOTICE: Payload 000000010000000000000001 NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio NOTICE: pglz_decompress_hacked | 23.747444 | 0.578344 | 0.159809 NOTICE: pglz_decompress_hacked8 | 23.764193 | 0.677800 | 0.159809 NOTICE: pglz_decompress_hacked16 | 23.740351 | 0.704730 | 0.159809 NOTICE: pglz_decompress_vanilla | 23.797917 | 1.227868 | 0.159809 NOTICE: pglz_decompress_hacked_seg | 12.261808 | 0.625634 | 0.184952 Comment: Compression speed increased by nearly 100% with compression rate dropped by 15% NOTICE: Payload 000000010000000000000001 sliced by 2Kb NOTICE: pglz_decompress_hacked | 12.616956 | 0.621223 | 0.156953 NOTICE: pglz_decompress_hacked8 | 12.583685 | 0.756741 | 0.156953 NOTICE: pglz_decompress_hacked16 | 12.512636 | 0.774980 | 0.156953 NOTICE: pglz_decompress_vanilla | 12.493062 | 1.262820 | 0.156953 NOTICE: pglz_decompress_hacked_seg | 11.986554 | 0.622654 | 0.159590 NOTICE: Payload 000000010000000000000001 sliced by 4Kb NOTICE: pglz_decompress_hacked | 15.514469 | 0.565565 | 0.154213 NOTICE: pglz_decompress_hacked8 | 15.529144 | 0.699675 | 0.154213 NOTICE: pglz_decompress_hacked16 | 15.514040 | 0.721145 | 0.154213 NOTICE: pglz_decompress_vanilla | 15.558958 | 1.237237 | 0.154213 NOTICE: pglz_decompress_hacked_seg | 14.650309 | 0.563228 | 0.153652 NOTICE: Payload 000000010000000000000006 NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio NOTICE: pglz_decompress_hacked | 8.610177 | 0.153577 | 0.052294 NOTICE: pglz_decompress_hacked8 | 8.566785 | 0.168002 | 0.052294 NOTICE: pglz_decompress_hacked16 | 8.643126 | 0.167537 | 0.052294 NOTICE: pglz_decompress_vanilla | 8.574498 | 0.930738 | 0.052294 NOTICE: pglz_decompress_hacked_seg | 7.394731 | 0.171924 | 0.056081 NOTICE: Payload 000000010000000000000006 sliced by 2Kb NOTICE: pglz_decompress_hacked | 6.724060 | 0.295043 | 0.065541 NOTICE: pglz_decompress_hacked8 | 6.623018 | 0.318527 | 0.065541 NOTICE: pglz_decompress_hacked16 | 6.898034 | 0.318360 | 0.065541 NOTICE: pglz_decompress_vanilla | 6.712711 | 1.045430 | 0.065541 NOTICE: pglz_decompress_hacked_seg | 6.630743 | 0.302589 | 0.068471 NOTICE: Payload 000000010000000000000006 sliced by 4Kb NOTICE: pglz_decompress_hacked | 6.624067 | 0.220942 | 0.058865 NOTICE: pglz_decompress_hacked8 | 6.659424 | 0.240183 | 0.058865 NOTICE: pglz_decompress_hacked16 | 6.763864 | 0.240564 | 0.058865 NOTICE: pglz_decompress_vanilla | 6.743574 | 0.985348 | 0.058865 NOTICE: pglz_decompress_hacked_seg | 6.613123 | 0.227582 | 0.060330 NOTICE: Payload 000000010000000000000008 NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio NOTICE: pglz_decompress_hacked | 52.425957 | 1.050544 | 0.498941 NOTICE: pglz_decompress_hacked8 | 52.204561 | 1.261592 | 0.498941 NOTICE: pglz_decompress_hacked16 | 52.328491 | 1.466751 | 0.498941 NOTICE: pglz_decompress_vanilla | 52.465308 | 1.341271 | 0.498941 NOTICE: pglz_decompress_hacked_seg | 31.896341 | 1.113260 | 0.600998 NOTICE: Payload 000000010000000000000008 sliced by 2Kb NOTICE: pglz_decompress_hacked | 30.620611 | 0.768542 | 0.351941 NOTICE: pglz_decompress_hacked8 | 30.557334 | 0.907421 | 0.351941 NOTICE: pglz_decompress_hacked16 | 32.064903 | 1.208913 | 0.351941 NOTICE: pglz_decompress_vanilla | 30.489886 | 1.014197 | 0.351941 NOTICE: pglz_decompress_hacked_seg | 27.145243 | 0.774193 | 0.352868 NOTICE: Payload 000000010000000000000008 sliced by 4Kb NOTICE: pglz_decompress_hacked | 36.567903 | 1.054633 | 0.514047 NOTICE: pglz_decompress_hacked8 | 36.459124 | 1.267731 | 0.514047 NOTICE: pglz_decompress_hacked16 | 36.791718 | 1.479650 | 0.514047 NOTICE: pglz_decompress_vanilla | 36.241913 | 1.303136 | 0.514047 NOTICE: pglz_decompress_hacked_seg | 31.526327 | 1.059926 | 0.526875 NOTICE: Payload 16398 NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio NOTICE: pglz_decompress_hacked | 9.508625 | 0.435190 | 0.071816 NOTICE: pglz_decompress_hacked8 | 9.546987 | 0.473871 | 0.071816 NOTICE: pglz_decompress_hacked16 | 9.534496 | 0.471662 | 0.071816 NOTICE: pglz_decompress_vanilla | 9.559053 | 1.352561 | 0.071816 NOTICE: pglz_decompress_hacked_seg | 8.479486 | 0.441536 | 0.073232 NOTICE: Payload 16398 sliced by 2Kb NOTICE: pglz_decompress_hacked | 6.808167 | 0.326570 | 0.082775 NOTICE: pglz_decompress_hacked8 | 6.790743 | 0.361720 | 0.082775 NOTICE: pglz_decompress_hacked16 | 6.886097 | 0.364549 | 0.082775 NOTICE: pglz_decompress_vanilla | 6.918429 | 1.191265 | 0.082775 NOTICE: pglz_decompress_hacked_seg | 6.752811 | 0.340805 | 0.085705 NOTICE: Payload 16398 sliced by 4Kb NOTICE: pglz_decompress_hacked | 7.244472 | 0.261872 | 0.076860 NOTICE: pglz_decompress_hacked8 | 7.290275 | 0.295988 | 0.076860 NOTICE: pglz_decompress_hacked16 | 7.340706 | 0.294683 | 0.076860 NOTICE: pglz_decompress_vanilla | 7.429289 | 1.151645 | 0.076860 NOTICE: pglz_decompress_hacked_seg | 7.054166 | 0.267896 | 0.078325 NOTICE: Payload shakespeare.txt NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio NOTICE: pglz_decompress_hacked | 25.998753 | 1.345542 | 0.281363 NOTICE: pglz_decompress_hacked8 | 26.121630 | 1.917667 | 0.281363 NOTICE: pglz_decompress_hacked16 | 26.139312 | 2.101329 | 0.281363 NOTICE: pglz_decompress_vanilla | 26.155571 | 2.082123 | 0.281363 NOTICE: pglz_decompress_hacked_seg | 16.792089 | 1.951269 | 0.436558 Comment: In this case, the compression rate has dropped dramatically. NOTICE: Payload shakespeare.txt sliced by 2Kb NOTICE: pglz_decompress_hacked | 14.992793 | 1.923663 | 0.436270 NOTICE: pglz_decompress_hacked8 | 14.982428 | 2.695319 | 0.436270 NOTICE: pglz_decompress_hacked16 | 15.211803 | 2.846615 | 0.436270 NOTICE: pglz_decompress_vanilla | 15.113214 | 2.580098 | 0.436270 NOTICE: pglz_decompress_hacked_seg | 15.120852 | 1.922596 | 0.439199 NOTICE: Payload shakespeare.txt sliced by 4Kb NOTICE: pglz_decompress_hacked | 18.083400 | 1.687598 | 0.366936 NOTICE: pglz_decompress_hacked8 | 18.185038 | 2.395928 | 0.366936 NOTICE: pglz_decompress_hacked16 | 18.096120 | 2.554812 | 0.366936 NOTICE: pglz_decompress_vanilla | 18.435380 | 2.329129 | 0.366936 NOTICE: pglz_decompress_hacked_seg | 18.103267 | 1.705517 | 0.368400 NOTICE: Decompressor score (summ of all times): NOTICE: Decompressor pglz_decompress_hacked result 11.288848 NOTICE: Decompressor pglz_decompress_hacked8 result 14.438165 NOTICE: Decompressor pglz_decompress_hacked16 result 15.716280 NOTICE: Decompressor pglz_decompress_vanilla result 21.034867 NOTICE: Decompressor pglz_decompress_hacked_seg result 12.090609 NOTICE: compressor score (summ of all times): NOTICE: compressor pglz_compress_vanilla result 276.776671 NOTICE: compressor pglz_compress_hacked_seg result 222.407850 There are some questions now: 1. The compression algorithm is not compatible with the original compression algorithm now. 2. If the idea works, we need to test more data, what kind of data is more appropriate? Any comments are much appreciated. Best regards, Binguo Bao. [0] https://docs.google.com/document/d/1V4oXV5vGrGx24deBTKKM7bVdO3Cy-zfj-wQ4dkBUCl4/edit [1] https://github.com/djydewang/test_pglz