Thomas Munro <> writes:
> The assertion in tsvector_delete_by_indices fails because its counting
> algorithm doesn't expect indices_to_delete to contain multiple
> references to the same index.  Maybe that could be fixed by
> uniquifying in tsvector_delete_arr before calling it, but since
> tsvector_delete_by_indices already qsorts its input, it should be able
> to handle duplicates cheaply.

I poked at this and realized that that's not sufficient.  If there are
duplicates in indices_to_delete, then the initial estimate

        tsout->size = tsv->size - indices_count;

is wrong because indices_count is an overestimate of how many lexemes
will be removed.  And because the calculation "dataout = STRPTR(tsout)"
depends on tsout->size, we can't just wait till later to get it right.

We could possibly initialize tsout->size = tsv->size (the maximum
possible value), thereby ensuring that the WordEntry array doesn't
overlap the dataout area; compute the correct tsout->size in the loop;
and then memmove the data area into place to collapse out wasted space.
But I think it might be simpler and better-performant just to de-dup the
indices_to_delete array after qsort'ing it; that would certainly win
for the case of indices_count == 1.

The other problems I noted with failure to delete items seem to stem
from the fact that tsvector_delete_arr relies on tsvector_bsearch to
find items, but the input tsvector is not sorted (never mind de'duped)
by array_to_tsvector.  This seems like simple brain fade in
array_to_tsvector, as AFAICS that's a required property of tsvectors.

                        regards, tom lane

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to