[
https://issues.apache.org/jira/browse/LUCENE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-5150:
-
Attachment: LUCENE-5150.patch
Here is a patch. It reserves an additional bit in the header to say whether the
encoding should be inversed (meaning clean words are actually 0xFF instead of
0x00).
It should reduce the amount of memory required to build and store dense sets.
In spite of this change, compression ratios remain the same for sparse sets.
For random dense sets, I observed compression ratios of 87% when the load
factor is 90% and 20% when the load factor is 99% (vs. 100% before).
WAH8DocIdSet: dense sets compression
Key: LUCENE-5150
URL: https://issues.apache.org/jira/browse/LUCENE-5150
Project: Lucene - Core
Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
Attachments: LUCENE-5150.patch
In LUCENE-5101, Paul Elschot mentioned that it would be interesting to be
able to encode the inverse set to also compress very dense sets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org