Re: [HACKERS] [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Thomas Munrowrites: > The assertion in tsvector_delete_by_indices fails because its counting > algorithm doesn't expect indices_to_delete to contain multiple > references to the same index. Maybe that could be fixed by > uniquifying in tsvector_delete_arr before calling it, but since > tsvector_delete_by_indices already qsorts its input, it should be able > to handle duplicates cheaply. I poked at this and realized that that's not sufficient. If there are duplicates in indices_to_delete, then the initial estimate tsout->size = tsv->size - indices_count; is wrong because indices_count is an overestimate of how many lexemes will be removed. And because the calculation "dataout = STRPTR(tsout)" depends on tsout->size, we can't just wait till later to get it right. We could possibly initialize tsout->size = tsv->size (the maximum possible value), thereby ensuring that the WordEntry array doesn't overlap the dataout area; compute the correct tsout->size in the loop; and then memmove the data area into place to collapse out wasted space. But I think it might be simpler and better-performant just to de-dup the indices_to_delete array after qsort'ing it; that would certainly win for the case of indices_count == 1. The other problems I noted with failure to delete items seem to stem from the fact that tsvector_delete_arr relies on tsvector_bsearch to find items, but the input tsvector is not sorted (never mind de'duped) by array_to_tsvector. This seems like simple brain fade in array_to_tsvector, as AFAICS that's a required property of tsvectors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
On Thu, Aug 4, 2016 at 9:39 AM, Tom Lanewrote: > Andreas Seltenreich writes: >> the following statement triggers an assertion in tsearch: > >> select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]), >> '{smith,smith}'::text[]); >> -- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", >> Line: 511) > > Confirmed here. I notice that the output of array_to_tsvector() is > already fishy in this example: > > regression=# select array_to_tsvector('{smith,smith,smith}'::text[]); > array_to_tsvector > - > 'smith' 'smith' 'smith' > (1 row) > > Shouldn't those have been merged together? You certainly don't get > results like that from other tsvector-producing operations: > > regression=# select to_tsvector('smith smith smith'); > to_tsvector > --- > 'smith':1,2,3 > (1 row) > regression=# select 'smith smith smith'::tsvector; > tsvector > -- > 'smith' > (1 row) > > However, that does not seem to be the proximate cause of the crash > in ts_delete, because this non-duplicated case still crashes: > > select ts_delete(array_to_tsvector('{smith,smithx,smithy}'::text[]), > '{smith,smith}'::text[]); > > It kinda looks like you need more than one deletion request for > the first entry in the sorted tsvector, because for example > {smith,foo,toolbox} works but not {smith,too,toolbox}. > > I'm thinking there are two distinct bugs here. The assertion in tsvector_delete_by_indices fails because its counting algorithm doesn't expect indices_to_delete to contain multiple references to the same index. Maybe that could be fixed by uniquifying in tsvector_delete_arr before calling it, but since tsvector_delete_by_indices already qsorts its input, it should be able to handle duplicates cheaply. I was thinking something like this: for (i = j = k = 0; i < tsv->size; i++) { + bool drop_lexeme = false; + /* * Here we should check whether current i is present in * indices_to_delete or not. Since indices_to_delete is already sorted -* we can advance it index only when we have match. +* we can advance it index only when we have match. We do this +* repeatedly, in case indices_to_delete contains duplicate references +* to the same index. */ - if (k < indices_count && i == indices_to_delete[k]) + while (k < indices_count && i == indices_to_delete[k]) { + drop_lexeme = true; k++; - continue; } + if (drop_lexeme) + continue; But that doesn't seem to be enough, there is something else wrong here resulting in garbage output, maybe related to the failure to merge the tsvector... -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
I wrote: > I'm thinking there are two distinct bugs here. Actually, make that three bugs. I was so focused on the crashing that I failed to notice that ts_delete wasn't producing sane answers even when it didn't crash: regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), '{smith,smith}'::text[]); ts_delete - 'smith' 'foo' 'bar' (1 row) regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), '{smith,foo}'::text[]); ts_delete --- 'smith' 'bar' (1 row) regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), '{bar,smith}'::text[]); ts_delete - 'smith' 'foo' 'bar' (1 row) The non-array version is no better: regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), 'smith'::text); ts_delete - 'smith' 'foo' 'bar' (1 row) regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), 'foo'::text); ts_delete --- 'smith' 'bar' (1 row) regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), 'bar'::text); ts_delete - 'smith' 'foo' 'bar' (1 row) I'm not sure if ts_delete takes its second argument as verbatim lexemes or normalizes them first, but none of these words are changed by to_tsvector, so either way it seems to fail to delete stuff it should. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Andreas Seltenreichwrites: > the following statement triggers an assertion in tsearch: > select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]), > '{smith,smith}'::text[]); > -- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", > Line: 511) Confirmed here. I notice that the output of array_to_tsvector() is already fishy in this example: regression=# select array_to_tsvector('{smith,smith,smith}'::text[]); array_to_tsvector - 'smith' 'smith' 'smith' (1 row) Shouldn't those have been merged together? You certainly don't get results like that from other tsvector-producing operations: regression=# select to_tsvector('smith smith smith'); to_tsvector --- 'smith':1,2,3 (1 row) regression=# select 'smith smith smith'::tsvector; tsvector -- 'smith' (1 row) However, that does not seem to be the proximate cause of the crash in ts_delete, because this non-duplicated case still crashes: select ts_delete(array_to_tsvector('{smith,smithx,smithy}'::text[]), '{smith,smith}'::text[]); It kinda looks like you need more than one deletion request for the first entry in the sorted tsvector, because for example {smith,foo,toolbox} works but not {smith,too,toolbox}. I'm thinking there are two distinct bugs here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers