On Thu, Aug 4, 2016 at 9:39 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Andreas Seltenreich <seltenre...@gmx.de> writes: >> the following statement triggers an assertion in tsearch: > >> select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]), >> '{smith,smith}'::text[]); >> -- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", >> Line: 511) > > Confirmed here. I notice that the output of array_to_tsvector() is > already fishy in this example: > > regression=# select array_to_tsvector('{smith,smith,smith}'::text[]); > array_to_tsvector > ------------------------- > 'smith' 'smith' 'smith' > (1 row) > > Shouldn't those have been merged together? You certainly don't get > results like that from other tsvector-producing operations: > > regression=# select to_tsvector('smith smith smith'); > to_tsvector > --------------- > 'smith':1,2,3 > (1 row) > regression=# select 'smith smith smith'::tsvector; > tsvector > ---------- > 'smith' > (1 row) > > However, that does not seem to be the proximate cause of the crash > in ts_delete, because this non-duplicated case still crashes: > > select ts_delete(array_to_tsvector('{smith,smithx,smithy}'::text[]), > '{smith,smith}'::text[]); > > It kinda looks like you need more than one deletion request for > the first entry in the sorted tsvector, because for example > {smith,foo,toolbox} works but not {smith,too,toolbox}. > > I'm thinking there are two distinct bugs here.
The assertion in tsvector_delete_by_indices fails because its counting algorithm doesn't expect indices_to_delete to contain multiple references to the same index. Maybe that could be fixed by uniquifying in tsvector_delete_arr before calling it, but since tsvector_delete_by_indices already qsorts its input, it should be able to handle duplicates cheaply. I was thinking something like this: for (i = j = k = 0; i < tsv->size; i++) { + bool drop_lexeme = false; + /* * Here we should check whether current i is present in * indices_to_delete or not. Since indices_to_delete is already sorted - * we can advance it index only when we have match. + * we can advance it index only when we have match. We do this + * repeatedly, in case indices_to_delete contains duplicate references + * to the same index. */ - if (k < indices_count && i == indices_to_delete[k]) + while (k < indices_count && i == indices_to_delete[k]) { + drop_lexeme = true; k++; - continue; } + if (drop_lexeme) + continue; But that doesn't seem to be enough, there is something else wrong here resulting in garbage output, maybe related to the failure to merge the tsvector... -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers