[
https://issues.apache.org/jira/browse/LUCENE-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-4161:
-
Attachment: LUCENE-4161.patch
First version of the patch.
A few things that were internal now need to be exposed, so I tried to do some
clean up:
* {{CODEC_NAME}} and CODEC_VERSION{START,CURRENT} are public,
* the format is an enum (PackedInts.Format.{PACKED,PACKED_SINGLE_BLOCK}),
* improved docs overall.
There are new factory methods get{Reader,ReaderIterator,Writer}NoHeader that do
the same as their get{Reader,ReaderIterator,Writer} counterpart, but with no
header writing/checking.
Improved performance of Reader/Mutable bulk methods (using code generation, see
http://people.apache.org/~jpountz/packed_ints.html vs.
http://people.apache.org/~jpountz/packed_ints2.html).
{{ReaderIterator}} and {{Writer}} now use the same code as
{{Reader}}/{{Mutable}} bulk methods so they are likely to be much faster too.
In addition, ReaderIterator now allows consumers to retrieve several values at
the same time.
{{Direct*}} and {{Packed*ThreeBlocks}} had a lot of duplicate code that was not
factorizable so I created scripts to generate them.
Something that might still slow down ReaderIterator (probably the most useful
class for codecs) a bit is that ReaderIterator always reads one long at a time.
Adding a method to bulk-read longs to DataInput (similarly to readBytes) might
improve performance. This probably deserves an other issue in JIRA and can be
done later.
Make PackedInts usable by codecs
Key: LUCENE-4161
URL: https://issues.apache.org/jira/browse/LUCENE-4161
Project: Lucene - Java
Issue Type: Improvement
Components: core/store
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
Attachments: LUCENE-4161.patch
Some codecs might be interested in using
PackedInts.{Writer,Reader,ReaderIterator} to read and write fixed-size values
efficiently.
The problem is that the serialization format is self contained, and always
writes the name of the codec, its version, its number of bits per value and
its format. For example, if you want to use packed ints to store your
postings list, this is a lot of overhead (at least ~60 bytes per term, in
case you only use one Writer per term, more otherwise).
Users should be able to externalize the storage of metadata to save space.
For example, to use PackedInts to store a postings list, one should be able
to store the codec name, its version and the number of bits per doc in the
header of the terms+postings list instead of having to write it once (or
more!) per term.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org