[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-02-10 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490174#comment-17490174
 ] 

Adrien Grand commented on LUCENE-8739:
--

My opinion is that there are interesting benefits, but they are not worth the 
cost of adding an extra dependency on the library that provides the JNI 
bindings. Sure it performs better on retrieval than BEST_COMPRESSION, but if 
retrieval is what a user cares most about then BEST_SPEED is an even better 
option.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
> Attachments: image-2022-01-11-02-18-11-402.png, 
> image-2022-01-11-02-18-57-752.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-02-10 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490161#comment-17490161
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Hi Adrien,

Thank you for your feedback! I am a little unclear as to why we should wait for 
Panama to have a new JNI-based codec? That codec will not be part of the Lucene 
core, but as mentioned it will be an unofficial codec included under 
Lucene/codecs? Given the tremendous performance benefits shouldn’t the 
customers (users) be allowed to use JNI in their deployments if they chose to?

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
> Attachments: image-2022-01-11-02-18-11-402.png, 
> image-2022-01-11-02-18-57-752.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-02-03 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486619#comment-17486619
 ] 

Adrien Grand commented on LUCENE-8739:
--

Robert disagreed with introducing a requirement on libzstd for the default 
codec, which makes sense. We could still make it an unofficial codec under 
lucene/codecs when Panama lands.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
> Attachments: image-2022-01-11-02-18-11-402.png, 
> image-2022-01-11-02-18-57-752.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-02-03 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486614#comment-17486614
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

As we observed earlier, Zstd is at par with vanilla/Cloudflare zlib in terms of 
compression ratio but at the same time, there is a significant gain in 
retrieval time. I have made the default compression level as 6 (though it is a 
configurable parameter), with 48KB block size and 8KB dictionary. Any 
additional comments?

This solution is part of custom codec and will allow users to use ZSTD on their 
data. However, we can revisit the idea of adding it to Lucene core in the 
future when Project Panama lands.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
> Attachments: image-2022-01-11-02-18-11-402.png, 
> image-2022-01-11-02-18-57-752.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-10 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472298#comment-17472298
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Hi [~rcmuir] 

That is exactly what I am doing :)

CustomCompressionCodec is inside lucene/codecs (same location as 
SimpleTextCodec) and reuses Lucene90CompressingStoredFieldsFormat to improvise 
storedfield compression using zstd. The idea is to power users to choose 
compression algorithm and also bring their own compression algorithm via 
CustomCompressionCodec. Currently it has zstd only

[https://github.com/apache/lucene/pull/439]

Zstd has overwhelmed me by being *37%* faster than Cloudflare zlib and *54%* 
faster than vanilla zlib in terms of retrieved time while slightly 
outperforming both in terms of compression ratio at compression level 6.

!image-2022-01-11-02-18-11-402.png|width=441,height=118!

!image-2022-01-11-02-18-57-752.png|width=448,height=44!

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
> Attachments: image-2022-01-11-02-18-11-402.png, 
> image-2022-01-11-02-18-57-752.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-09 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471315#comment-17471315
 ] 

Robert Muir commented on LUCENE-8739:
-

We already have a compression abstraction in lucene: CompressingCodec etc. Can 
we avoid adding another one?

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-07 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470999#comment-17470999
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Hi [~rcmuir] 

This is why I have created a custom codec outside of Lucene core where 
SimpleTextCodec has been created, to provide Lucene users an option to use zstd 
and also bring in any compression algos.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-07 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470624#comment-17470624
 ] 

Robert Muir commented on LUCENE-8739:
-

> +1 ZSTD is quite great. I wouldn't use it in the Lucene default codec yet, 
> because lucene-core shouldn't have dependencies and we don't want to use JNI 
> in the lucere-core build. Maybe we can reconsider when Project Panama lands 
> and it gets easier to interact with native libraries.

IMO this applies to native libraries too though. I'd disagree with lucene not 
working correctly depending upon existence or version of libzstd.so on the 
machine. 

The performance/space tradeoffs are not particularly compelling to me to be 
worth the native-library hassle right now. Level 4 is the only one slightly 
interesting, as it would give compression similar to BEST_COMPRESSION with 
indexing time similar to BEST_SPEED, but still the retrieval is slow. And the 
differences compared to cloudflare zlib aren't that big.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-07 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470506#comment-17470506
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

WOW! That's a lot of wonderful feedback here :)

I started working on this to provide Lucene users an option to use Zstandard 
for compression/decompression but this seems to be turning out really well! I 
am encouraged by the data Adrien has put here and Zstandard with dictionary, 
and at level 6  it seems to outperform zlib in terms of compression ratio. 

I have updated PR to reflect 48KB block size with suggested code change.

Custom Codec is so designed that we can introduce any compression level and any 
block size. Different use cases may involve changing compression level for 
either better compression ratio or compression speed. It is extensible as well 
to provide a new compression algorithm or a different zstd flavor.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-04 Thread Tobias Ibounig (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468610#comment-17468610
 ] 

Tobias Ibounig commented on LUCENE-8739:


Ok this all sounds very good.

Just one more thing for further trade off considerations:
ZSTD also supports negative compression levels (but I don't know how those are 
exposed in JNI library), [see benchmark table|https://github.com/facebook/zstd].
So level=-1 could be another consideration to get closer to LZ4 Retrieval Speed 
for BEST_SPEED.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-04 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468498#comment-17468498
 ] 

Adrien Grand commented on LUCENE-8739:
--

bq. Would such an increase even make sense or would this cause other issues?

It would require reading more data from disk. This read would be sequential so 
I suspect it wouldn't hurt much, including on slower I/O. The main drawback is 
probably that it would trash a bit more of filesystem cache. That said I agree 
with you that we should probably look into increasing the block size with 
ZStandard. I just did a run with 1.5x larger blocks and level=6, it slightly 
outperforms our current BEST_COMPRESSION mode across indexing time, disk usage 
and compression.

||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs 
(ms) ||
| ZSTD dict level=6 1.5x larger blocks | 43228 | 57.455 | 1269.22127 |

bq. Or would 3 presets be too much choice?

IMO it would be too much, but I like the fact that ZSTD could help us have two 
options for compression that share the exact same read logic, e.g. if we 
replaced BEST_SPEED with what you suggested for BALANCED: low level ZSTD 
compression with a small block size.

bq. Anyway I see potential for good tradeoffs here.

+1 ZSTD is quite great. I wouldn't use it in the Lucene default codec yet, 
because lucene-core shouldn't have dependencies and we don't want to use JNI in 
the lucere-core build. Maybe we can reconsider when Project Panama lands and it 
gets easier to interact with native libraries.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-04 Thread Tobias Ibounig (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468482#comment-17468482
 ] 

Tobias Ibounig commented on LUCENE-8739:


Would it make sense to increase the block size until retrieval times approach 
those of zlib (between CF and vanilla)?
Would such an increase even make sense or would this cause other issues?

Then there also could be 3 presets

BEST_SPEED --> stays LZ4
BALANCED --> low level ZSTD + dict (maybe even slightly smaller block size, for 
slightly faster retrial)
BEST_COMPRESSION --> ZSTD with higher block size and higher level (maybe 5-9)

Or would 3 presets be too much choice?

Anyway I see potential for good tradeoffs here.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-04 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468417#comment-17468417
 ] 

Adrien Grand commented on LUCENE-8739:
--

I updated block sizes so that ZSTD uses the same block sizes as 
BEST_COMPRESSION and it looks much better now.

||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs 
(ms) ||
| BEST_SPEED (LZ4 with small blocks) | 35383 | 90.175 | 190.17524 |
| BEST_COMPRESSION (vanilla zlib, DEFLATE level=6) | 76671 | 58.682 | 
1910.42106 |
| BEST_COMPRESSION (Cloudflare zlib, DEFLATE level=6) | 54791 | 58.601 | 
1395.53593 |
| ZSTD dict (level=1) | 24687 | 63.324 | 928.73997 |
| ZSTD dict (level=2) | 24934 | 63.722 | 977.29911 |
| ZSTD dict (level=3) | 28285 | 62.072 | 938.10886 |
| ZSTD dict (level=4) | 37863 | 60.427 | 969.18655 |
| ZSTD dict (level=5) | 45479 | 59.317 | 941.20922 |
| ZSTD dict (level=6) | 57842 | 58.481 | 881.69049 |
| ZSTD dict (level=7) | 65796 | 58.107 | 886.42249 |

On this dataset, the main benefit seems to be the retrieval speed. Regarding 
indexing times and space efficiency, either you go with level 5 and you are 
faster to index data but less space-efficient than DEFLATE (with the Cloudflare 
zlib), or you go with level 6 and you are more space-efficient but slower to 
index.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-03 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468175#comment-17468175
 ] 

Adrien Grand commented on LUCENE-8739:
--

I may have found the issue, your codec was using the same block size as 
BEST_SPEED, which are smaller than the ones used by BEST_COMPRESSION. I left 
comments on the PR to align block sizes with BEST_COMPRESSION to make ZSTD more 
easily comparable with BEST_COMPRESSION.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-03 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468137#comment-17468137
 ] 

Adrien Grand commented on LUCENE-8739:
--

I ran the same benchmark over the above PR with the dictionary mode.

||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs 
(ms) ||
| BEST_SPEED | 35383 | 90.175 | 190.17524 |
| BEST_COMPRESSION (vanilla zlib) | 76671 | 58.682 | 1910.42106 |
| BEST_COMPRESSION (Cloudflare zlib) | 54791 | 58.601 | 1395.53593 |
| ZSTD (level=1) | 42433 | 70.527 | 240.04036 |
| ZSTD (level=3) | 53426 | 68.737 | 259.61897 |
| ZSTD (level=6) | 100697 | 66.283 | 251.91177 |
| ZSTD dict (level=1) | 50571 | 69.860 | 254.10496 |
| ZSTD dict (level=3) | 60580 | 68.690 | 266.72929 |
| ZSTD dict (level=6) | 128322 | 65.605 | 251.91177 |

Compression ratios are a bit disappointing, I wonder if this is because DEFLATE 
outperforms ZSTD on this sort of data or because there is a bug in your 
contribution?


> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-12-27 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465718#comment-17465718
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Added dictionary support for Zstandard - 
https://github.com/apache/lucene/pull/439

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-11-15 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443976#comment-17443976
 ] 

Adrien Grand commented on LUCENE-8739:
--

Side thought: it would be nice to use Project Panama's Foreign linker when it 
gets released instead of depending on this JNI library.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-11-15 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443964#comment-17443964
 ] 

Adrien Grand commented on LUCENE-8739:
--

I ran your PR with the new stored fields benchmark to see how codecs compare:

||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs 
(ms) ||
| BEST_SPEED | 35383 | 90.175 | 190.17524 |
| BEST_COMPRESSION (vanilla zlib) | 76671 | 58.682 | 1910.42106 |
| BEST_COMPRESSION (Cloudflare zlib) | 54791 | 58.601 | 1395.53593 |
| ZSTD (level=1) | 42433 | 70.527 | 240.04036 |
| ZSTD (level=3) | 53426 | 68.737 | 259.61897 |
| ZSTD (level=6) | 100697 | 66.283 | 251.91177 |

>From a quick look at your PR, it looks like you are not using dictionaries, 
>which would explain why we're seeing a worse compression ratio?

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-11-15 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443912#comment-17443912
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

I have created a pull request - [https://github.com/apache/lucene/pull/439]

I am using Zstd-JNI [https://github.com/luben/zstd-jni] in a new custom codec 
which integrates Zstd compression and decompression in StoredFieldFormat.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432677#comment-17432677
 ] 

Adrien Grand commented on LUCENE-8739:
--

You need to download 
https://download.geonames.org/export/dump/allCountries.zip, unzip it and then 
use it to run the above benchmark which is a simple standalone Java class with 
a main class.

To run it with your own codec, you will need to modify the code a bit to use it 
rather than Lucene's default codec.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432590#comment-17432590
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Hi Adrien,

Can you please help me by stating the way to compare my stored fields format 
against Lucene's built-in formats?

Thanks!

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432589#comment-17432589
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Hi Mike,

-Dtests.nightly=true ran successfully , took more than an hour to complete!

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432506#comment-17432506
 ] 

Adrien Grand commented on LUCENE-8739:
--

You might be interested in the new simple benchmark for stored fields that we 
added to luceneutil to compare your stored fields format against Lucene's 
built-in formats: 
https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/StoredFieldsBenchmark.java.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432444#comment-17432444
 ] 

Michael McCandless commented on LUCENE-8739:


{quote}My codec passed all test cases with test option -Dtests.codec=MyCodec.
{quote}
Aha, that is great news!  Lucene's tests tend to stress out new Codecs.  If you 
want to evil-up the tests, pass {{-Dtests.nightly=true}}.  The tests will run 
longer but try harder to find problems.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432288#comment-17432288
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Hi Mike,

My codec passed all test cases with test option -Dtests.codec=MyCodec.

Now i am working on luceneutil benchmark. Thanks for your reply in dev 
community thread!

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-11 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427037#comment-17427037
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Hi Mike

I see Adrien has used JNA based Zstd implementation while i have taken JNI 
approach.

I am working on running all test using option -Dtests.codec=MyCodec.

Above data is obtained after running high load of lucene benchmark over reuters 
corpus. Should i also capture luceneutil benchmark result? While running 
luceneutil, I observed few discrepancies in the stat, for which I raised an 
issue to clarify - ref #142"

Please guide!

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-09-30 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422815#comment-17422815
 ] 

Michael McCandless commented on LUCENE-8739:


Wow, these are compelling results!

Can you try running all Lucene unit tests with your new Codec?  Something like 
{{-Dtests.codec=MyCodec}}.  That is a great way to stress out a new Codec to 
look for any problems.  Every test (except those that require a specific Codec) 
will exercise yours.

How does your ([~pru30]) approach compare to [~jpountz]'s?

Have you tried running {{luceneutil}} benchmarks with this new Codec?  I'm very 
curious how it behaves on a larger corpus (English Wikipedia)...

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-09-15 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415363#comment-17415363
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

_I have developed new custom codec which integrates Zstd compression and 
decompression in StoredFieldFormat_ _only. It uses Zstd-JNI 
([https://github.com/luben/zstd-jni]). With reuters21578 (plain text Document 
derived from reuters21578) corpus benchmark run for index and search, following 
high level observations were made:_

 __ 
 #  _Zstd provides a better compression ratio compared to lz4. Benchmark 
run(index) shows 30% smaller size .fdt(Stored Field data) file compared to LZ4._
 #  _Index run with Zstd has almost same throughput as that of index run with 
LZ4._
 #  _Search run with Zstd has 6% faster QPS than search run with LZ4_

 __ 

_Above implementation is written in Java without dictionary 
compression/decompression at default compression level of 3 with 600 KB chunk 
size (10 * 60 * 1024 , same as LZ4)._  

 __ 

_With all these observations, Zstd option alongside LZ4 and deflate looks 
promising!! Kindly share thoughts!_

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-06-07 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358574#comment-17358574
 ] 

Adrien Grand commented on LUCENE-8739:
--

I opened a PR that uses the exact same approach and block sizes as the default 
codec with DEFLATE, but uses ZSTD instead. It calls ZSTD through JNA, so 
libzstd needs to be installed locally.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-06-04 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357132#comment-17357132
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Zstd JNI https://github.com/luben/zstd-jni looks very promising and being used 
in cassandra, kafka and other popular apache projects. Can we create a custom 
codec using Zstd JNI in codecs folder - 
https://github.com/apache/lucene/tree/main/lucene/codecs/src/java/org/apache/lucene/codecs
 ?

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-04-19 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324904#comment-17324904
 ] 

Adrien Grand commented on LUCENE-8739:
--

Hi [~wicked1099], force-merging wouldn't change anything: we still compress 
data into small chunks of ~48kB in order to be able to decompress as little as 
possible when reading a single stored document.

We don't like introducing options in the default codec because it makes 
backward compatibility too hard and prevents us from moving forward. Expert 
users can still create their own codec if they wish to.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-04-18 Thread Sean Torres (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324662#comment-17324662
 ] 

Sean Torres commented on LUCENE-8739:
-

If the current runtime compression is comparable to DEFLATE, I would also be 
interested in the gains from ZSTD after a forceMerge of segments is performed.

I believe the use case would differ base on the workload and data set used. 
However, I believe this would be worth including as an option for each user to 
decide to use on their own.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-04-16 Thread Sean Torres (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324044#comment-17324044
 ] 

Sean Torres commented on LUCENE-8739:
-

How about the performance and storage cost once a force merge action has been 
performed?

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-04-16 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323762#comment-17323762
 ] 

Dawid Weiss commented on LUCENE-8739:
-

Because it would make it very difficult to work for everyone who embeds Lucene 
- this is a low-level library; java dependencies are a nightmare to maintain.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-04-16 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322945#comment-17322945
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Hi [~jpountz],

Kindly help us understand why lucene-core can't have dependencies?

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-04-15 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322015#comment-17322015
 ] 

Adrien Grand commented on LUCENE-8739:
--

I forgot to update this issue but I actually played with ZSTD a few months ago 
using JNA. I have an dirty ugly untested branch at 
https://github.com/jpountz/lucene-solr/tree/zstd if you are curious.

The results were good but not as appealing as benchmarks that work on whole 
files. It seems to me that most of the compression gains of ZSTD compared to 
Deflate come from the larger sliding window that it uses at compression time 
(Deflate can only deduplicate strings that occur within 30kB of each other). 
But given how Lucene splits stored fields into small-ish blocks anyway in order 
to keep decompression fast, ZSTD didn't yield much smaller indexes. Regarding 
compression/decompression speed, ZSTD did perform better than vanilla DEFLATE, 
but most of this gap can actually be filled by using a DEFLATE variant that 
vectorizes the slowest bits like Cloudflare's DEFLATE, which can be done on the 
default codec by putting the other DEFLATE variant on the LD_LIBRARY_PATH.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2020-03-26 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067798#comment-17067798
 ] 

Michael McCandless commented on LUCENE-8739:


I think this is worth a deep dive, at least to understand its performance for 
"typical" Lucene use cases ... I've heard (just anecdotally) that ZSTD shows 
impressive speed and compression.  That said, the added complexity in 
implementation is definitely a downside.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2019-12-16 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997346#comment-16997346
 ] 

Adrien Grand commented on LUCENE-8739:
--

As I expected it needs quite a lot of code, compared to the  500 lines we have 
for LZ4. If you can run benchmarks, I'd be curious, but in general I suspect 
that the JDK implementation of DEFLATE is more appealing for the kind of 
trade-offs that zstd provides.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org