On Fri, Feb 2, 2018 at 10:26 PM, Peter Geoghegan <p...@bowt.ie> wrote: > My proposed commit message > has a full explanation of the Valgrind issue, which I won't repeat > here. Go read it before reading the rest of this e-mail.
I'm going to paste the first two sentences of your proposed commit message in here for the convenience of other readers, since I want to reply to them. # LogicalTapeFreeze() may write out its first block when it is dirty but # not full, and then immediately read the first block back in from its # BufFile as a BLCKSZ-width block. This can only occur in rare cases # where next to no tuples were written out, which is only possible with # parallel external tuplesorts. So, if I understand correctly what you're saying here, valgrind is totally cool with us writing out an only-partially-initialized block to a disk file, but it's got a real problem with us reading that data back into the same memory space it already occupies. That's a little odd. I presume that it's common for the tail of the final block written to be uninitialized, but normally when we then go read block 0, that's some other, fully initialized block. It seems like it would be pretty easy to just suppress the useless read when we've already got the correct data, and I'd lean toward going that direction since it's a valid optimization anyway. But I'd like to hear some opinions from people who use and think about valgrind more than I do (Tom, Andres, Noah, ...?). > It might seem like my suppression is overly broad, or not broad > enough, since it essentially targets LogicalTapeFreeze(). I don't > think it is, though, because this can occur in two places within > LogicalTapeFreeze() -- it can occur in the place we actually saw the > issue on lousyjack, from the ltsReadBlock() call within > LogicalTapeFreeze(), as well as a second place -- when > BufFileExportShared() is called. I found that you have to tweak code > to prevent it happening in the first place before you'll see it happen > in the second place. I don't quite see how that would happen, because BufFileExportShared, at least AFAICS, doesn't touch the buffer? Unfortunately valgrind does not work at all on my laptop -- the server appears to start, but as soon as you try to connect, the whole thing dies with an error claiming that the startup process has failed. So I can't easily test this at the moment. I'll try to get it working, here or elsewhere, but thought I'd send the above reply first. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company