Thomas Munro <thomas.mu...@enterprisedb.com> writes: > The assertion in tsvector_delete_by_indices fails because its counting > algorithm doesn't expect indices_to_delete to contain multiple > references to the same index. Maybe that could be fixed by > uniquifying in tsvector_delete_arr before calling it, but since > tsvector_delete_by_indices already qsorts its input, it should be able > to handle duplicates cheaply.
I poked at this and realized that that's not sufficient. If there are duplicates in indices_to_delete, then the initial estimate tsout->size = tsv->size - indices_count; is wrong because indices_count is an overestimate of how many lexemes will be removed. And because the calculation "dataout = STRPTR(tsout)" depends on tsout->size, we can't just wait till later to get it right. We could possibly initialize tsout->size = tsv->size (the maximum possible value), thereby ensuring that the WordEntry array doesn't overlap the dataout area; compute the correct tsout->size in the loop; and then memmove the data area into place to collapse out wasted space. But I think it might be simpler and better-performant just to de-dup the indices_to_delete array after qsort'ing it; that would certainly win for the case of indices_count == 1. The other problems I noted with failure to delete items seem to stem from the fact that tsvector_delete_arr relies on tsvector_bsearch to find items, but the input tsvector is not sorted (never mind de'duped) by array_to_tsvector. This seems like simple brain fade in array_to_tsvector, as AFAICS that's a required property of tsvectors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers