Re: PackedInts functionalities

2023-10-17 Thread Dongyu Xu
t, I do need random access within the block of terms as its main usage is to back the term dictionary. Thanks, Tony From: Adrien Grand Sent: Tuesday, October 17, 2023 1:26 AM To: dev@lucene.apache.org Subject: Re: PackedInts functionalities +1 to what Mikh

Re: PackedInts functionalities

2023-10-17 Thread Adrien Grand
+1 to what Mikhail wrote, this is e.g. how postings work: instead of interleaving doc IDs and frequencies, they always store a block of 128 doc IDs followed by a block of 128 frequencies. For reference, bit packing feels space-inefficient for this kind of data. I would expect docFreqs to have a zi

Re: PackedInts functionalities

2023-10-16 Thread Mikhail Khludnev
Hello Tony Is it possible to write a block of docfreqs and then a block of postingoffsets? Or why not write them as 10-bit integers and then split to quad and sextet in the posting format code? On Mon, Oct 16, 2023 at 11:50 PM Dongyu Xu wrote: > Hi devs, > > As I was working on https://github.co

PackedInts functionalities

2023-10-16 Thread Dongyu Xu
Hi devs, As I was working on https://github.com/apache/lucene/issues/12513 I needed to compress positive integers which are used to locate postings etc. To put it concretely, I will need to pack a few values per term contiguously and those values can have different bit-width. For example, consi