On Thu, 11 Nov 2021 at 18:08, David Rowley <dgrowle...@gmail.com> wrote: > > On Thu, 30 Sept 2021 at 10:54, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Actually, the more I look at this the more unhappy I get, because > > it's becoming clear that you have made unfounded semantic > > assumptions. The hash functions generally only promise that they > > will distinguish values that are distinguishable by the associated > > equality operator. We have plenty of data types in which that does > > not map to bitwise equality ... you need not look further than > > float8 for an example. > > I think this part might be best solved by allowing Memoize to work in > a binary mode. We already have datum_image_eq() for performing a > binary comparison on a Datum. We'll also need to supplement that with > a function that generates a hash value based on the binary value too.
Since I really don't want to stop Memoize from working with LATERAL joined function calls, I see no other way other than to make use of a binary key'd cache for cases where we can't be certain that it's safe to make work in non-binary mode. I've had thoughts about if we should just make it work in binary mode all the time, but my thoughts are that that's not exactly a great idea since it could have a negative effect on cache hits due to there being the possibility that some types such as case insensitive text where the number of variations in the binary representation might be vast. The reason this could be bad is that the estimated cache hit ratio is calculated by looking at n_distinct, which is obviously not looking for distinctions in the binary representation. So in binary mode, we may get a lower cache hit ratio than we might think we'll get, even with accurate statistics. I'd like to minimise those times by only using binary mode when we can't be certain that logical mode is safe. The patch does add new fields to the Memoize plan node type, the MemoizeState executor node and also MemoizePath. The new fields do fit inside the padding of the existing structs. I've also obviously had to modify the read/write/copy functions for Memoize to add the new field there too. My understanding is that this should be ok since those are only used for parallel query to send plans to the working and to deserialise it on the worker side. There should never be any version mismatches there. If anyone wants to chime in about my proposed patch for this, then please do so soon. I'm planning to look at this in my Tuesday morning (UTC+13). David