On Mon, Aug 12, 2013 at 8:38 AM, Mathias Lux <m...@itec.uni-klu.ac.at> wrote:
> Hi!
>
> I'm basically searching for a method to put byte[] data into Lucene
> DocValues of type BINARY (see [1]). Currently only primitives and
> Strings are supported according to [1].
>
> I know that this can be done with a custom update handler, but I'd
> like to avoid that.
>

Can you describe a little bit what kind of operations you want to do with it?
I don't really know how BinaryField is typically used, but maybe it
could support this option. On the other hand adding it to BinaryField
might not "buy" you much without some additional stuff depending upon
what you need to do. Like if you really want to do sort/facet on the
thing, SORTED(SET) would probably be a better implementation: it
doesnt care that the values are binary.

BINARY, SORTED, and SORTED_SET actually all take byte[]: the difference is:
* SORTED: deduplicates/compresses the unique byte[]'s and gives each
document an ordinal number that reflects sort order (for
sorting/faceting/grouping/etc)
* SORTED_SET: similar, except each document has a "set" (which can be
empty), of ordinal numbers (e.g. for faceting multivalued fields)
* BINARY: just stores the byte[] for each document (no deduplication,
no compression, no ordinals, nothing).

So for sorting/faceting: BINARY is generally not very efficient unless
there is something custom going on: for example lucene's faceting
package stores the "values" elsewhere in a separate taxonomy index, so
it uses this type just to encode a delta-compressed ordinal list for
each document.

For scoring factors/function queries: encoding the values inside
NUMERIC(s) [up to 64 bits each] might still be best on average: the
compression applied here is surprisingly efficient.

Reply via email to