[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490174#comment-17490174 ] Adrien Grand commented on LUCENE-8739: -- My opinion is that there are interesting benefits, but they are not worth the cost of adding an extra dependency on the library that provides the JNI bindings. Sure it performs better on retrieval than BEST_COMPRESSION, but if retrieval is what a user cares most about then BEST_SPEED is an even better option. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Attachments: image-2022-01-11-02-18-11-402.png, > image-2022-01-11-02-18-57-752.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490161#comment-17490161 ] Praveen Nishchal commented on LUCENE-8739: -- Hi Adrien, Thank you for your feedback! I am a little unclear as to why we should wait for Panama to have a new JNI-based codec? That codec will not be part of the Lucene core, but as mentioned it will be an unofficial codec included under Lucene/codecs? Given the tremendous performance benefits shouldn’t the customers (users) be allowed to use JNI in their deployments if they chose to? > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Attachments: image-2022-01-11-02-18-11-402.png, > image-2022-01-11-02-18-57-752.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486619#comment-17486619 ] Adrien Grand commented on LUCENE-8739: -- Robert disagreed with introducing a requirement on libzstd for the default codec, which makes sense. We could still make it an unofficial codec under lucene/codecs when Panama lands. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Attachments: image-2022-01-11-02-18-11-402.png, > image-2022-01-11-02-18-57-752.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486614#comment-17486614 ] Praveen Nishchal commented on LUCENE-8739: -- As we observed earlier, Zstd is at par with vanilla/Cloudflare zlib in terms of compression ratio but at the same time, there is a significant gain in retrieval time. I have made the default compression level as 6 (though it is a configurable parameter), with 48KB block size and 8KB dictionary. Any additional comments? This solution is part of custom codec and will allow users to use ZSTD on their data. However, we can revisit the idea of adding it to Lucene core in the future when Project Panama lands. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Attachments: image-2022-01-11-02-18-11-402.png, > image-2022-01-11-02-18-57-752.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472298#comment-17472298 ] Praveen Nishchal commented on LUCENE-8739: -- Hi [~rcmuir] That is exactly what I am doing :) CustomCompressionCodec is inside lucene/codecs (same location as SimpleTextCodec) and reuses Lucene90CompressingStoredFieldsFormat to improvise storedfield compression using zstd. The idea is to power users to choose compression algorithm and also bring their own compression algorithm via CustomCompressionCodec. Currently it has zstd only [https://github.com/apache/lucene/pull/439] Zstd has overwhelmed me by being *37%* faster than Cloudflare zlib and *54%* faster than vanilla zlib in terms of retrieved time while slightly outperforming both in terms of compression ratio at compression level 6. !image-2022-01-11-02-18-11-402.png|width=441,height=118! !image-2022-01-11-02-18-57-752.png|width=448,height=44! > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Attachments: image-2022-01-11-02-18-11-402.png, > image-2022-01-11-02-18-57-752.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471315#comment-17471315 ] Robert Muir commented on LUCENE-8739: - We already have a compression abstraction in lucene: CompressingCodec etc. Can we avoid adding another one? > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470999#comment-17470999 ] Praveen Nishchal commented on LUCENE-8739: -- Hi [~rcmuir] This is why I have created a custom codec outside of Lucene core where SimpleTextCodec has been created, to provide Lucene users an option to use zstd and also bring in any compression algos. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 2.5h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470624#comment-17470624 ] Robert Muir commented on LUCENE-8739: - > +1 ZSTD is quite great. I wouldn't use it in the Lucene default codec yet, > because lucene-core shouldn't have dependencies and we don't want to use JNI > in the lucere-core build. Maybe we can reconsider when Project Panama lands > and it gets easier to interact with native libraries. IMO this applies to native libraries too though. I'd disagree with lucene not working correctly depending upon existence or version of libzstd.so on the machine. The performance/space tradeoffs are not particularly compelling to me to be worth the native-library hassle right now. Level 4 is the only one slightly interesting, as it would give compression similar to BEST_COMPRESSION with indexing time similar to BEST_SPEED, but still the retrieval is slow. And the differences compared to cloudflare zlib aren't that big. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 2.5h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470506#comment-17470506 ] Praveen Nishchal commented on LUCENE-8739: -- WOW! That's a lot of wonderful feedback here :) I started working on this to provide Lucene users an option to use Zstandard for compression/decompression but this seems to be turning out really well! I am encouraged by the data Adrien has put here and Zstandard with dictionary, and at level 6 it seems to outperform zlib in terms of compression ratio. I have updated PR to reflect 48KB block size with suggested code change. Custom Codec is so designed that we can introduce any compression level and any block size. Different use cases may involve changing compression level for either better compression ratio or compression speed. It is extensible as well to provide a new compression algorithm or a different zstd flavor. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 2.5h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468610#comment-17468610 ] Tobias Ibounig commented on LUCENE-8739: Ok this all sounds very good. Just one more thing for further trade off considerations: ZSTD also supports negative compression levels (but I don't know how those are exposed in JNI library), [see benchmark table|https://github.com/facebook/zstd]. So level=-1 could be another consideration to get closer to LZ4 Retrieval Speed for BEST_SPEED. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1.5h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468498#comment-17468498 ] Adrien Grand commented on LUCENE-8739: -- bq. Would such an increase even make sense or would this cause other issues? It would require reading more data from disk. This read would be sequential so I suspect it wouldn't hurt much, including on slower I/O. The main drawback is probably that it would trash a bit more of filesystem cache. That said I agree with you that we should probably look into increasing the block size with ZStandard. I just did a run with 1.5x larger blocks and level=6, it slightly outperforms our current BEST_COMPRESSION mode across indexing time, disk usage and compression. ||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs (ms) || | ZSTD dict level=6 1.5x larger blocks | 43228 | 57.455 | 1269.22127 | bq. Or would 3 presets be too much choice? IMO it would be too much, but I like the fact that ZSTD could help us have two options for compression that share the exact same read logic, e.g. if we replaced BEST_SPEED with what you suggested for BALANCED: low level ZSTD compression with a small block size. bq. Anyway I see potential for good tradeoffs here. +1 ZSTD is quite great. I wouldn't use it in the Lucene default codec yet, because lucene-core shouldn't have dependencies and we don't want to use JNI in the lucere-core build. Maybe we can reconsider when Project Panama lands and it gets easier to interact with native libraries. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1.5h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468482#comment-17468482 ] Tobias Ibounig commented on LUCENE-8739: Would it make sense to increase the block size until retrieval times approach those of zlib (between CF and vanilla)? Would such an increase even make sense or would this cause other issues? Then there also could be 3 presets BEST_SPEED --> stays LZ4 BALANCED --> low level ZSTD + dict (maybe even slightly smaller block size, for slightly faster retrial) BEST_COMPRESSION --> ZSTD with higher block size and higher level (maybe 5-9) Or would 3 presets be too much choice? Anyway I see potential for good tradeoffs here. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1.5h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468417#comment-17468417 ] Adrien Grand commented on LUCENE-8739: -- I updated block sizes so that ZSTD uses the same block sizes as BEST_COMPRESSION and it looks much better now. ||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs (ms) || | BEST_SPEED (LZ4 with small blocks) | 35383 | 90.175 | 190.17524 | | BEST_COMPRESSION (vanilla zlib, DEFLATE level=6) | 76671 | 58.682 | 1910.42106 | | BEST_COMPRESSION (Cloudflare zlib, DEFLATE level=6) | 54791 | 58.601 | 1395.53593 | | ZSTD dict (level=1) | 24687 | 63.324 | 928.73997 | | ZSTD dict (level=2) | 24934 | 63.722 | 977.29911 | | ZSTD dict (level=3) | 28285 | 62.072 | 938.10886 | | ZSTD dict (level=4) | 37863 | 60.427 | 969.18655 | | ZSTD dict (level=5) | 45479 | 59.317 | 941.20922 | | ZSTD dict (level=6) | 57842 | 58.481 | 881.69049 | | ZSTD dict (level=7) | 65796 | 58.107 | 886.42249 | On this dataset, the main benefit seems to be the retrieval speed. Regarding indexing times and space efficiency, either you go with level 5 and you are faster to index data but less space-efficient than DEFLATE (with the Cloudflare zlib), or you go with level 6 and you are more space-efficient but slower to index. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1.5h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468175#comment-17468175 ] Adrien Grand commented on LUCENE-8739: -- I may have found the issue, your codec was using the same block size as BEST_SPEED, which are smaller than the ones used by BEST_COMPRESSION. I left comments on the PR to align block sizes with BEST_COMPRESSION to make ZSTD more easily comparable with BEST_COMPRESSION. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h 20m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468137#comment-17468137 ] Adrien Grand commented on LUCENE-8739: -- I ran the same benchmark over the above PR with the dictionary mode. ||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs (ms) || | BEST_SPEED | 35383 | 90.175 | 190.17524 | | BEST_COMPRESSION (vanilla zlib) | 76671 | 58.682 | 1910.42106 | | BEST_COMPRESSION (Cloudflare zlib) | 54791 | 58.601 | 1395.53593 | | ZSTD (level=1) | 42433 | 70.527 | 240.04036 | | ZSTD (level=3) | 53426 | 68.737 | 259.61897 | | ZSTD (level=6) | 100697 | 66.283 | 251.91177 | | ZSTD dict (level=1) | 50571 | 69.860 | 254.10496 | | ZSTD dict (level=3) | 60580 | 68.690 | 266.72929 | | ZSTD dict (level=6) | 128322 | 65.605 | 251.91177 | Compression ratios are a bit disappointing, I wonder if this is because DEFLATE outperforms ZSTD on this sort of data or because there is a bug in your contribution? > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465718#comment-17465718 ] Praveen Nishchal commented on LUCENE-8739: -- Added dictionary support for Zstandard - https://github.com/apache/lucene/pull/439 > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443976#comment-17443976 ] Adrien Grand commented on LUCENE-8739: -- Side thought: it would be nice to use Project Panama's Foreign linker when it gets released instead of depending on this JNI library. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443964#comment-17443964 ] Adrien Grand commented on LUCENE-8739: -- I ran your PR with the new stored fields benchmark to see how codecs compare: ||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs (ms) || | BEST_SPEED | 35383 | 90.175 | 190.17524 | | BEST_COMPRESSION (vanilla zlib) | 76671 | 58.682 | 1910.42106 | | BEST_COMPRESSION (Cloudflare zlib) | 54791 | 58.601 | 1395.53593 | | ZSTD (level=1) | 42433 | 70.527 | 240.04036 | | ZSTD (level=3) | 53426 | 68.737 | 259.61897 | | ZSTD (level=6) | 100697 | 66.283 | 251.91177 | >From a quick look at your PR, it looks like you are not using dictionaries, >which would explain why we're seeing a worse compression ratio? > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443912#comment-17443912 ] Praveen Nishchal commented on LUCENE-8739: -- I have created a pull request - [https://github.com/apache/lucene/pull/439] I am using Zstd-JNI [https://github.com/luben/zstd-jni] in a new custom codec which integrates Zstd compression and decompression in StoredFieldFormat. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432677#comment-17432677 ] Adrien Grand commented on LUCENE-8739: -- You need to download https://download.geonames.org/export/dump/allCountries.zip, unzip it and then use it to run the above benchmark which is a simple standalone Java class with a main class. To run it with your own codec, you will need to modify the code a bit to use it rather than Lucene's default codec. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432590#comment-17432590 ] Praveen Nishchal commented on LUCENE-8739: -- Hi Adrien, Can you please help me by stating the way to compare my stored fields format against Lucene's built-in formats? Thanks! > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432589#comment-17432589 ] Praveen Nishchal commented on LUCENE-8739: -- Hi Mike, -Dtests.nightly=true ran successfully , took more than an hour to complete! > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432506#comment-17432506 ] Adrien Grand commented on LUCENE-8739: -- You might be interested in the new simple benchmark for stored fields that we added to luceneutil to compare your stored fields format against Lucene's built-in formats: https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/StoredFieldsBenchmark.java. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432444#comment-17432444 ] Michael McCandless commented on LUCENE-8739: {quote}My codec passed all test cases with test option -Dtests.codec=MyCodec. {quote} Aha, that is great news! Lucene's tests tend to stress out new Codecs. If you want to evil-up the tests, pass {{-Dtests.nightly=true}}. The tests will run longer but try harder to find problems. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432288#comment-17432288 ] Praveen Nishchal commented on LUCENE-8739: -- Hi Mike, My codec passed all test cases with test option -Dtests.codec=MyCodec. Now i am working on luceneutil benchmark. Thanks for your reply in dev community thread! > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427037#comment-17427037 ] Praveen Nishchal commented on LUCENE-8739: -- Hi Mike I see Adrien has used JNA based Zstd implementation while i have taken JNI approach. I am working on running all test using option -Dtests.codec=MyCodec. Above data is obtained after running high load of lucene benchmark over reuters corpus. Should i also capture luceneutil benchmark result? While running luceneutil, I observed few discrepancies in the stat, for which I raised an issue to clarify - ref #142" Please guide! > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422815#comment-17422815 ] Michael McCandless commented on LUCENE-8739: Wow, these are compelling results! Can you try running all Lucene unit tests with your new Codec? Something like {{-Dtests.codec=MyCodec}}. That is a great way to stress out a new Codec to look for any problems. Every test (except those that require a specific Codec) will exercise yours. How does your ([~pru30]) approach compare to [~jpountz]'s? Have you tried running {{luceneutil}} benchmarks with this new Codec? I'm very curious how it behaves on a larger corpus (English Wikipedia)... > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415363#comment-17415363 ] Praveen Nishchal commented on LUCENE-8739: -- _I have developed new custom codec which integrates Zstd compression and decompression in StoredFieldFormat_ _only. It uses Zstd-JNI ([https://github.com/luben/zstd-jni]). With reuters21578 (plain text Document derived from reuters21578) corpus benchmark run for index and search, following high level observations were made:_ __ # _Zstd provides a better compression ratio compared to lz4. Benchmark run(index) shows 30% smaller size .fdt(Stored Field data) file compared to LZ4._ # _Index run with Zstd has almost same throughput as that of index run with LZ4._ # _Search run with Zstd has 6% faster QPS than search run with LZ4_ __ _Above implementation is written in Java without dictionary compression/decompression at default compression level of 3 with 600 KB chunk size (10 * 60 * 1024 , same as LZ4)._ __ _With all these observations, Zstd option alongside LZ4 and deflate looks promising!! Kindly share thoughts!_ > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358574#comment-17358574 ] Adrien Grand commented on LUCENE-8739: -- I opened a PR that uses the exact same approach and block sizes as the default codec with DEFLATE, but uses ZSTD instead. It calls ZSTD through JNA, so libzstd needs to be installed locally. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357132#comment-17357132 ] Praveen Nishchal commented on LUCENE-8739: -- Zstd JNI https://github.com/luben/zstd-jni looks very promising and being used in cassandra, kafka and other popular apache projects. Can we create a custom codec using Zstd JNI in codecs folder - https://github.com/apache/lucene/tree/main/lucene/codecs/src/java/org/apache/lucene/codecs ? > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324904#comment-17324904 ] Adrien Grand commented on LUCENE-8739: -- Hi [~wicked1099], force-merging wouldn't change anything: we still compress data into small chunks of ~48kB in order to be able to decompress as little as possible when reading a single stored document. We don't like introducing options in the default codec because it makes backward compatibility too hard and prevents us from moving forward. Expert users can still create their own codec if they wish to. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324662#comment-17324662 ] Sean Torres commented on LUCENE-8739: - If the current runtime compression is comparable to DEFLATE, I would also be interested in the gains from ZSTD after a forceMerge of segments is performed. I believe the use case would differ base on the workload and data set used. However, I believe this would be worth including as an option for each user to decide to use on their own. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324044#comment-17324044 ] Sean Torres commented on LUCENE-8739: - How about the performance and storage cost once a force merge action has been performed? > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323762#comment-17323762 ] Dawid Weiss commented on LUCENE-8739: - Because it would make it very difficult to work for everyone who embeds Lucene - this is a low-level library; java dependencies are a nightmare to maintain. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322945#comment-17322945 ] Praveen Nishchal commented on LUCENE-8739: -- Hi [~jpountz], Kindly help us understand why lucene-core can't have dependencies? > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322015#comment-17322015 ] Adrien Grand commented on LUCENE-8739: -- I forgot to update this issue but I actually played with ZSTD a few months ago using JNA. I have an dirty ugly untested branch at https://github.com/jpountz/lucene-solr/tree/zstd if you are curious. The results were good but not as appealing as benchmarks that work on whole files. It seems to me that most of the compression gains of ZSTD compared to Deflate come from the larger sliding window that it uses at compression time (Deflate can only deduplicate strings that occur within 30kB of each other). But given how Lucene splits stored fields into small-ish blocks anyway in order to keep decompression fast, ZSTD didn't yield much smaller indexes. Regarding compression/decompression speed, ZSTD did perform better than vanilla DEFLATE, but most of this gap can actually be filled by using a DEFLATE variant that vectorizes the slowest bits like Cloudflare's DEFLATE, which can be done on the default codec by putting the other DEFLATE variant on the LD_LIBRARY_PATH. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067798#comment-17067798 ] Michael McCandless commented on LUCENE-8739: I think this is worth a deep dive, at least to understand its performance for "typical" Lucene use cases ... I've heard (just anecdotally) that ZSTD shows impressive speed and compression. That said, the added complexity in implementation is definitely a downside. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997346#comment-16997346 ] Adrien Grand commented on LUCENE-8739: -- As I expected it needs quite a lot of code, compared to the 500 lines we have for LZ4. If you can run benchmarks, I'd be curious, but in general I suspect that the JDK implementation of DEFLATE is more appealing for the kind of trade-offs that zstd provides. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org