Hi,

I ran into an issue (previously discussed[1]; quoting Andres out of context that not addressing it then would "[a]ll but guarantee that we'll have this discussion again"[2]) when trying to build a very large expression index that did not fit within the page boundary. The real-world use case was related to a vector search technique where I wanted to use binary quantization based on the relationship between a constant vector (the average at a point-in-time across the entire data set) and the target vector[3][4]. An example:

CREATE INDEX ON embeddings
  USING hnsw((quantization_func(embedding, $VECTOR)) bit_hamming_ops);

However, I ran into the issue in[1], where pg_index was identified as catalog that is missing a toast table, even though `indexprs` is marked for extended storage. Providing a very simple reproducer in psql below:

----
CREATE TABLE def (id int);
SELECT array_agg(n) b FROM generate_series(1,10_000) n \gset
CREATE OR REPLACE FUNCTION vec_quantizer (a int, b int[]) RETURNS bool AS $$ SELECT true $$ LANGUAGE SQL IMMUTABLE;
CREATE INDEX ON def (vec_quantizer(id, :'b'));

ERROR:  row is too big: size 29448, maximum size 8160
---

This can come up with vector searches as vectors can be quite large - the case I was testing involved a 1536-dim floating point vector (~6KB), and the node parse tree pushed past the page boundary by about 2KB.

One could argue that pgvector or an extension can build in capabilities to handle quantization internally without requiring the user to provide a source vector (pgvectorscale does this). However, this also limits flexibility to users, as they may want to bring their own quantization functions to vector searches, e.g., as different quantization techniques emerge, or if a particular technique is more suitable for a person's dataset.

Thanks,

Jonathan

[1] https://www.postgresql.org/message-id/flat/84ddff04-f122-784b-b6c5-3536804495f8%40joeconway.com [2] https://www.postgresql.org/message-id/20180720000356.5zkhvfpsqswngyob%40alap3.anarazel.de
[3] https://github.com/pgvector/pgvector
[4] https://jkatz05.com/post/postgres/pgvector-scalar-binary-quantization/

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to