Re: Boolean field type

2023-11-13 Thread Michael McCandless
Hi Michael/Mikhails, yet another Mike here: If you create a NumericDocValuesField, and it only ever has one value per doc (0, 1), I think the default Codec will compress it well, though maybe not as well as your idea. It's a neat idea to notice a "very common default value" and not store it and j

Re: Boolean field type

2023-11-10 Thread Michael Froh
Thanks Mikhail and Mike! Mikhail, since you replied, I remembered your work on block joins in Solr (thank you for that, by the way!), which reminded me that it's not unusual for docs in a Lucene index to "mix" their schemata, like in parent/child blocks. If 90% of parent docs are "true" on a Boole

Re: Boolean field type

2023-11-09 Thread Michael Sokolov
Can you require the user to specify missing: true or missing: false semantics. With that you can decide what to do with the missing values On Thu, Nov 9, 2023, 7:55 AM Mikhail Khludnev wrote: > Hello Michael. > This optimization "NOT the less common value" assumes that boolean field > is require

Re: Boolean field type

2023-11-09 Thread Mikhail Khludnev
Hello Michael. This optimization "NOT the less common value" assumes that boolean field is required, but how to enforce this mandatory field constraint in Lucene? I'm not aware of something like Solr schema or mapping. If saying foo:true is common, it means that the posting list goes like dense seq