Hello, Sergey!
> Today I encountered a segmentation fault caused by the patch
> v20-0007-Add-Datum-storage-support-to-tuplestore.patch. During the merge
> phase, I inserted some tuples into the table so that STIR would have data for
> the validation phase. The segfault occurred during a call to tuplestore_end().
>
> The root cause is that a few functions in the tuplestore code still assume
> that all stored data is a pointer and thus attempt to pfree it. This
> assumption breaks when datumByVal is used, as the data is stored directly and
> not as a pointer. In particular, tuplestore_end(), tuplestore_trim(), and
> tuplestore_clear() incorrectly try to free such values.
>
> When addressing this, please also ensure that context memory accounting is
> handled properly: we should not increment or decrement the remaining context
> memory size when cleaning or trimming datumByVal entries, since no actual
> memory was allocated for them.
>
> Interestingly, I’m surprised you haven’t hit this segfault yourself. Are you
> perhaps testing on an older system where INT8OID is passed by reference? Or
> is your STIR always empty during the validation phase?
Thanks for pointing that out. It looks like tuplestore_trim and
tuplestore_clear are broken, while tuplestore_end seems to be correct
but fails due to previous heap corruption.
In my case, tuplestore_trim and tuplestore_clear aren't called at all
- that's why the issue wasn't detected. I'm not sure why; perhaps some
recent changes in your codebase are affecting that?
Please run a stress test (if you've already applied the in-place fix
for the tuplestore):
ninja && meson test --suite setup && meson test
--print-errorlogs --suite pg_amcheck *006*
This will help ensure everything else is working correctly on your system.
> One more point: I noticed you modified the index_create() function signature.
> You added the relpersistence parameter, which seems unnecessary—
> this can be determined internally by checking if it’s an auxiliary index, in
> which case the index should be marked as unlogged. You also added an
> auxiliaryIndexOfOid argument (do not remember exact naming, but was used for
> dependency). It might be cleaner to pass this via the IndexInfo structure
> instead. index_create() already has dozens of mouthful arguments, and
> external extensions
> (like pg_squeeze) still rely on the old signature, so minimizing changes to
> the function interface would improve compatibility.
Yes, that’s probably a good idea. I was trying to keep it simple from
the perspective of parameters to avoid dealing with some of the tricky
internal logic.
But you're right - it’s better to stick with the old signature.
Best regards,
Mikhail.