Re: [HACKERS] Hash Functions
On Fri, Sep 1, 2017 at 8:01 AM, Robert Haas wrote: > On Thu, Aug 31, 2017 at 8:40 AM, amul sul wrote: > > Fixed in the attached version. > > I fixed these up a bit and committed them. Thanks. > > I think this takes care of adding not only the infrastructure but > support for all the core data types, but I'm not quite sure how to > handle upgrading types in contrib. It looks like citext, hstore, and > several data types provided by isn have hash opclasses, and I think > that there's no syntax for adding a support function to an existing > opclass. We could add that, but I'm not sure how safe it would be. > > TBH, I really don't care much about fixing isn, but it seems like > fixing citext and hstore would be worthwhile. > Attached patch proposes the fix for the citext and hstore contrib. To make it easy to understand I've split these patch in two part. 0001 adds a new file for the contrib upgrade & renames an existing file to the higher version, and 0002 is the actual implementation of extended hash function for that contrib's data type. Regards, Amul 0001-hstore-File-renaming-v1.patch Description: Binary data 0002-hstore-add-extended-hash-function-v1.patch Description: Binary data 0001-citext-File-renaming-v1.patch Description: Binary data 0002-citext-add-extended-hash-function-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Robert Haas writes: > On Thu, Aug 31, 2017 at 10:55 PM, Tom Lane wrote: >> ALTER OPERATOR FAMILY ADD FUNCTION ... ? >> >> That would result in the functions being considered "loose" in the >> family rather than bound into an operator class. I think that's >> actually the right thing, because they shouldn't be considered >> to be required. > But wouldn't that result in a different effect than the core data type > changes I just did? Possibly --- I have not read that patch. But considering that all core functions are pinned anyway, it doesn't seem like it much matters whether we consider them to be loosely or tightly bound to opclasses. That should only matter if one tries to drop the function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Thu, Aug 31, 2017 at 10:55 PM, Tom Lane wrote: > Robert Haas writes: >> I think this takes care of adding not only the infrastructure but >> support for all the core data types, but I'm not quite sure how to >> handle upgrading types in contrib. It looks like citext, hstore, and >> several data types provided by isn have hash opclasses, and I think >> that there's no syntax for adding a support function to an existing >> opclass. We could add that, but I'm not sure how safe it would be. > > ALTER OPERATOR FAMILY ADD FUNCTION ... ? > > That would result in the functions being considered "loose" in the > family rather than bound into an operator class. I think that's > actually the right thing, because they shouldn't be considered > to be required. But wouldn't that result in a different effect than the core data type changes I just did? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Robert Haas writes: > I think this takes care of adding not only the infrastructure but > support for all the core data types, but I'm not quite sure how to > handle upgrading types in contrib. It looks like citext, hstore, and > several data types provided by isn have hash opclasses, and I think > that there's no syntax for adding a support function to an existing > opclass. We could add that, but I'm not sure how safe it would be. ALTER OPERATOR FAMILY ADD FUNCTION ... ? That would result in the functions being considered "loose" in the family rather than bound into an operator class. I think that's actually the right thing, because they shouldn't be considered to be required. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Thu, Aug 31, 2017 at 8:40 AM, amul sul wrote: > Fixed in the attached version. I fixed these up a bit and committed them. Thanks. I think this takes care of adding not only the infrastructure but support for all the core data types, but I'm not quite sure how to handle upgrading types in contrib. It looks like citext, hstore, and several data types provided by isn have hash opclasses, and I think that there's no syntax for adding a support function to an existing opclass. We could add that, but I'm not sure how safe it would be. TBH, I really don't care much about fixing isn, but it seems like fixing citext and hstore would be worthwhile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Wed, Aug 30, 2017 at 9:05 PM, Robert Haas wrote: > On Wed, Aug 30, 2017 at 10:43 AM, amul sul wrote: > > Thanks for the suggestion, I have updated 0002-patch accordingly. > > Using this I found some strange behaviours as follow: > > > > 1) standard and extended0 output for the jsonb_hash case is not same. > > 2) standard and extended0 output for the hash_range case is not same when > > input > >is int4range(550274, 1550274) other case in the patch are fine. This > can > > be > >reproducible with other values as well, for e.g. int8range(1570275, > > 208112489). > > > > Will look into this tomorrow. > > Those sound like bugs in your patch. Specifically: > > +/* Same approach as hash_range */ > +result = hash_uint32_extended((uint32) flags, seed); > +result ^= lower_hash; > +result = (result << 1) | (result >> 63); > +result ^= upper_hash; > > Yes, you are correct. > > > That doesn't give compatible results. The easiest thing to do might > be to rotate the high 32 bits and the low 32 bits separately. > JsonbHashScalarValueExtended has the same problem. Maybe create a > helper function that does something like this (untested): > > ((x << 1) & UINT64COUNT(0xfffefffe)) | ((x >> 31) & > UINT64CONST(0x10001)) > > This working as expected, also tested by executing the following SQL multiple times: SELECT v as value, hash_range(v)::bit(32) as standard, hash_range_extended(v, 0)::bit(32) as extended0, hash_range_extended(v, 1)::bit(32) as extended1 FROM (VALUES (int8range(floor(random() * 100)::int8, floor(random() * 1000)::int8)), (int8range(floor(random() * 1000)::int8, floor(random() * 1)::int8)), (int8range(floor(random() * 1)::int8, floor(random() * 10)::int8)), (int8range(floor(random() * 1000)::int8, floor(random() * 1)::int8)), (int8range(floor(random() * 1)::int8, floor(random() * 10)::int8))) x(v) WHERE hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32) OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32); > > Another case which I want to discuss is, extended and standard version of > > hashfloat4, hashfloat8 & hash_numeric function will have the same output > for > > zero > > value irrespective of seed value. Do you think we need a fix for this? > > Yes, I think you should return the seed rather than 0 in the cases > where the current code hard-codes a 0 return. That will give the same > behavior in the seed == 0 case while cheaply getting us something a > bit different when there is a seed. > > Fixed in the attached version. > BTW, you should probably invent and use a PG_RETURN_UINT64 macro in this > patch. > > Added i n the attached version . Thanks for your all the suggestions. Regards, Amul 0001-add-optional-second-hash-proc-v4.patch Description: Binary data 0002-test-Hash_functions_v3_wip.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Wed, Aug 30, 2017 at 10:43 AM, amul sul wrote: > Thanks for the suggestion, I have updated 0002-patch accordingly. > Using this I found some strange behaviours as follow: > > 1) standard and extended0 output for the jsonb_hash case is not same. > 2) standard and extended0 output for the hash_range case is not same when > input >is int4range(550274, 1550274) other case in the patch are fine. This can > be >reproducible with other values as well, for e.g. int8range(1570275, > 208112489). > > Will look into this tomorrow. Those sound like bugs in your patch. Specifically: +/* Same approach as hash_range */ +result = hash_uint32_extended((uint32) flags, seed); +result ^= lower_hash; +result = (result << 1) | (result >> 63); +result ^= upper_hash; That doesn't give compatible results. The easiest thing to do might be to rotate the high 32 bits and the low 32 bits separately. JsonbHashScalarValueExtended has the same problem. Maybe create a helper function that does something like this (untested): ((x << 1) & UINT64COUNT(0xfffefffe)) | ((x >> 31) & UINT64CONST(0x10001)) > Another case which I want to discuss is, extended and standard version of > hashfloat4, hashfloat8 & hash_numeric function will have the same output for > zero > value irrespective of seed value. Do you think we need a fix for this? Yes, I think you should return the seed rather than 0 in the cases where the current code hard-codes a 0 return. That will give the same behavior in the seed == 0 case while cheaply getting us something a bit different when there is a seed. BTW, you should probably invent and use a PG_RETURN_UINT64 macro in this patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Tue, Aug 29, 2017 at 11:48 PM, Robert Haas wrote: > On Tue, Aug 22, 2017 at 8:14 AM, amul sul wrote: > > Attaching patch 0002 for the reviewer's testing. > > I think that this 0002 is not something we can think of committing > because there's no guarantee that hash functions will return the same > results on all platforms. However, what we could and, I think, should > do is hash some representative values of each data type and verify > that hash(x) and hashextended(x, 0) come out the same at least as to > the low-order 32 bits -- and maybe that hashextend(x, 1) comes out > different as to the low-order 32 bits. The easiest way to do this > seems to be to cast to bit(32). For example: > > SELECT v, hashint4(v)::bit(32) as standard, > hashint4extended(v, 0)::bit(32) as extended0, > hashint4extended(v, 1)::bit(32) as extended1 > FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v) > WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32) >OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32); > > I suggest adding a version of this for each data type. > Thanks for the suggestion, I have updated 0002-patch accordingly. Using this I found some strange behaviours as follow: 1) standard and extended0 output for the jsonb_hash case is not same. 2) standard and extended0 output for the hash_range case is not same when input is int4range(550274, 1550274) other case in the patch are fine. This can be reproducible with other values as well, for e.g. int8range(1570275, 208112489). Will look into this tomorrow. Another case which I want to discuss is, extended and standard version of hashfloat4, hashfloat8 & hash_numeric function will have the same output for zero value irrespective of seed value. Do you think we need a fix for this? > From your latest version of 0001, I get: > > rangetypes.c:1297:8: error: unused variable 'rngtypid' > [-Werror,-Wunused-variable] > Oid rngtypid = RangeTypeGetOid(r); > > Fixed in the attached version. > I suggest not duplicating the comments from each regular function into > the extended function, but just writing /* Same approach as hashfloat8 > */ when the implementation is non-trivial (no need for this if the > extended function is a single line or the original function had no > comments anyway). > > Fixed in the attached version. > hash_aclitem seems embarrassingly stupid. I suggest that we make the > extended version slightly less stupid -- i.e. if the seed is non-zero, > actually call hash_any_extended on the sum and pass the seed through. > > * Reset info about hash function whenever we pick up new info > about > * equality operator. This is so we can ensure that the hash > function > * matches the operator. > */ > typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC); > +typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC); > > Adjust comment: "about hash function" -> "about hash functions", "hash > functions matches" -> "hash functions match". > > Fixed in the attached version. > +extern Datum > +hash_any_extended(register const unsigned char *k, register int > + keylen, uint64 seed); > > Formatting. > > Fixed in the attached version. Thanks ! Regards, Amul 0001-add-optional-second-hash-proc-v3.patch Description: Binary data 0002-test-Hash_functions_v2_wip.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Tue, Aug 22, 2017 at 8:14 AM, amul sul wrote: > Attaching patch 0002 for the reviewer's testing. I think that this 0002 is not something we can think of committing because there's no guarantee that hash functions will return the same results on all platforms. However, what we could and, I think, should do is hash some representative values of each data type and verify that hash(x) and hashextended(x, 0) come out the same at least as to the low-order 32 bits -- and maybe that hashextend(x, 1) comes out different as to the low-order 32 bits. The easiest way to do this seems to be to cast to bit(32). For example: SELECT v, hashint4(v)::bit(32) as standard, hashint4extended(v, 0)::bit(32) as extended0, hashint4extended(v, 1)::bit(32) as extended1 FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v) WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32) OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32); I suggest adding a version of this for each data type. >From your latest version of 0001, I get: rangetypes.c:1297:8: error: unused variable 'rngtypid' [-Werror,-Wunused-variable] Oid rngtypid = RangeTypeGetOid(r); I suggest not duplicating the comments from each regular function into the extended function, but just writing /* Same approach as hashfloat8 */ when the implementation is non-trivial (no need for this if the extended function is a single line or the original function had no comments anyway). hash_aclitem seems embarrassingly stupid. I suggest that we make the extended version slightly less stupid -- i.e. if the seed is non-zero, actually call hash_any_extended on the sum and pass the seed through. * Reset info about hash function whenever we pick up new info about * equality operator. This is so we can ensure that the hash function * matches the operator. */ typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC); +typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC); Adjust comment: "about hash function" -> "about hash functions", "hash functions matches" -> "hash functions match". +extern Datum +hash_any_extended(register const unsigned char *k, register int + keylen, uint64 seed); Formatting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Tue, Aug 22, 2017 at 5:44 PM, amul sul wrote: > On Fri, Aug 18, 2017 at 11:01 PM, Robert Haas > wrote: > >> On Fri, Aug 18, 2017 at 1:12 PM, amul sul wrote: >> > I have a small query, what if I want a cache entry with extended hash >> > function instead standard one, I might require that while adding >> > hash_array_extended function? Do you think we need to extend >> > lookup_type_cache() as well? >> >> Hmm, I thought you had changed the hash partitioning stuff so that it >> didn't rely on lookup_type_cache(). You have to look up the function >> using the opclass provided in the partition key definition; >> lookup_type_cache() will give you the default one for the datatype. >> Maybe just use get_opfamily_proc? >> >> > Yes, we can do that for > the > partitioning code, but my concern is a little bit > different. I apologize, I wasn't clear enough. > > I am trying to extend hash_array & hash_range function. The hash_array and > hash_range function calculates hash by using the respective hash function > for > the given argument type (i.e. array/range element type), and those hash > functions are made available in the TypeCacheEntry via > lookup_type_cache(). But > in the hash_array & hash_range extended version requires a respective > extended > hash function for those element type. > > I have added hash_array_extended & hash_range_extended function in the > attached > patch 0001, which maintains a local copy of TypeCacheEntry with extended > hash > functions. But I am a little bit skeptic about this logic, any > > advice/suggestions will be > greatly appreciated. > > Instead, in the attached patch, I have modified lookup_type_cache() to request it to get extended hash function in the TypeCacheEntry. For that, I've introduced new flags as TYPECACHE_HASH_EXTENDED_PROC, TYPECACHE_HASH_EXTENDED_PROC_FINFO & TCFLAGS_CHECKED_HASH_EXTENDED_PROC, and additional variables in TypeCacheEntry structure to hold extended hash proc information. > The logic in the rest of the extended hash functions is same as the > standard > one. > Same for the hash_array_extended() & hash_range_extended() function as well. Regards, Amul 0001-add-optional-second-hash-proc-v2.patch Description: Binary data 0002-test-Hash_functions_wip.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, Aug 18, 2017 at 11:01 PM, Robert Haas wrote: > On Fri, Aug 18, 2017 at 1:12 PM, amul sul wrote: > > I have a small query, what if I want a cache entry with extended hash > > function instead standard one, I might require that while adding > > hash_array_extended function? Do you think we need to extend > > lookup_type_cache() as well? > > Hmm, I thought you had changed the hash partitioning stuff so that it > didn't rely on lookup_type_cache(). You have to look up the function > using the opclass provided in the partition key definition; > lookup_type_cache() will give you the default one for the datatype. > Maybe just use get_opfamily_proc? > > Yes, we can do that for the partitioning code, but my concern is a little bit different. I apologize, I wasn't clear enough. I am trying to extend hash_array & hash_range function. The hash_array and hash_range function calculates hash by using the respective hash function for the given argument type (i.e. array/range element type), and those hash functions are made available in the TypeCacheEntry via lookup_type_cache(). But in the hash_array & hash_range extended version requires a respective extended hash function for those element type. I have added hash_array_extended & hash_range_extended function in the attached patch 0001, which maintains a local copy of TypeCacheEntry with extended hash functions. But I am a little bit skeptic about this logic, any advice/suggestions will be greatly appreciated. The logic in the rest of the extended hash functions is same as the standard one. Attaching patch 0002 for the reviewer's testing. Regards, Amul 0001-add-optional-second-hash-proc-v2-wip.patch Description: Binary data 0002-test-Hash_functions.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, Aug 18, 2017 at 1:12 PM, amul sul wrote: > I have a small query, what if I want a cache entry with extended hash > function instead standard one, I might require that while adding > hash_array_extended function? Do you think we need to extend > lookup_type_cache() as well? Hmm, I thought you had changed the hash partitioning stuff so that it didn't rely on lookup_type_cache(). You have to look up the function using the opclass provided in the partition key definition; lookup_type_cache() will give you the default one for the datatype. Maybe just use get_opfamily_proc? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, Aug 18, 2017 at 8:49 AM, Robert Haas wrote: > On Wed, Aug 16, 2017 at 5:34 PM, Robert Haas > wrote: > > Attached is a quick sketch of how this could perhaps be done (ignoring > > for the moment the relatively-boring opclass pushups). > > Here it is with some relatively-boring opclass pushups added. I just > did the int4 bit; the same thing will need to be done, uh, 35 more > times. But you get the gist. No, not that kind of gist. > > I will work on this. I have a small query, what if I want a cache entry with extended hash function instead standard one, I might require that while adding hash_array_extended function? Do you think we need to extend lookup_type_cache() as well? Regards, Amul
Re: [HACKERS] Hash Functions
On Wed, Aug 16, 2017 at 5:34 PM, Robert Haas wrote: > Attached is a quick sketch of how this could perhaps be done (ignoring > for the moment the relatively-boring opclass pushups). Here it is with some relatively-boring opclass pushups added. I just did the int4 bit; the same thing will need to be done, uh, 35 more times. But you get the gist. No, not that kind of gist. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company add-optional-second-hash-proc-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Kenneth Marshall writes: > On Wed, Aug 16, 2017 at 05:58:41PM -0400, Tom Lane wrote: >> ... In fact, on perusing the linked-to page >> http://burtleburtle.net/bob/hash/doobs.html >> Bob says specifically that taking b and c from this hash does not >> produce a fully random 64-bit result. He has a new one that does, >> lookup3.c, but probably he hasn't tried to make that bit-compatible >> with the 1997 algorithm. > The updated hash functions that we currently use are based on Bob Jenkins > lookup3.c and using b as the higher order bits is pretty darn good. I had > lobbied to present the 64-bit b+c hash in the original work for similar > reasons. We are definitely not using a lookup2.c version from 1997. Oh --- I overlooked the bit about "Bob's 2006 update". Really that comment block should have been completely rewritten, rather than leaving the original text there, especially since as it stands there are only pointers to the old algorithm. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Wed, Aug 16, 2017 at 05:58:41PM -0400, Tom Lane wrote: > Robert Haas writes: > > Attached is a quick sketch of how this could perhaps be done (ignoring > > for the moment the relatively-boring opclass pushups). It introduces > > a new function hash_any_extended which differs from hash_any() in that > > (a) it combines both b and c into the result and (b) it accepts a seed > > which is mixed into the initial state if it's non-zero. > > > Comments? > > Hm. Despite the comment at lines 302-304, I'm not sure that we ought > to do this simply by using "b" as the high order bits. AFAICS that > exposes little or no additional randomness; in particular it seems > unlikely to meet Jenkins' original design goal that "every 1-bit and > 2-bit delta achieves avalanche". There might be some simple way to > extend the existing code to produce a mostly-independent set of 32 more > bits, but I wonder if we wouldn't be better advised to just keep Jenkins' > code as-is and use some other method entirely for producing the > 32 new result bits. > > ... In fact, on perusing the linked-to page > http://burtleburtle.net/bob/hash/doobs.html > Bob says specifically that taking b and c from this hash does not > produce a fully random 64-bit result. He has a new one that does, > lookup3.c, but probably he hasn't tried to make that bit-compatible > with the 1997 algorithm. > Hi, The updated hash functions that we currently use are based on Bob Jenkins lookup3.c and using b as the higher order bits is pretty darn good. I had lobbied to present the 64-bit b+c hash in the original work for similar reasons. We are definitely not using a lookup2.c version from 1997. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Robert Haas writes: > Attached is a quick sketch of how this could perhaps be done (ignoring > for the moment the relatively-boring opclass pushups). It introduces > a new function hash_any_extended which differs from hash_any() in that > (a) it combines both b and c into the result and (b) it accepts a seed > which is mixed into the initial state if it's non-zero. > Comments? Hm. Despite the comment at lines 302-304, I'm not sure that we ought to do this simply by using "b" as the high order bits. AFAICS that exposes little or no additional randomness; in particular it seems unlikely to meet Jenkins' original design goal that "every 1-bit and 2-bit delta achieves avalanche". There might be some simple way to extend the existing code to produce a mostly-independent set of 32 more bits, but I wonder if we wouldn't be better advised to just keep Jenkins' code as-is and use some other method entirely for producing the 32 new result bits. ... In fact, on perusing the linked-to page http://burtleburtle.net/bob/hash/doobs.html Bob says specifically that taking b and c from this hash does not produce a fully random 64-bit result. He has a new one that does, lookup3.c, but probably he hasn't tried to make that bit-compatible with the 1997 algorithm. Your injection of the seed as prepended data seems unassailable from the randomness standpoint. But I wonder whether we could do it more cheaply by xoring the seed into the initial a/b/c values --- it's not very clear whether those are magic in any interesting way, or just some randomly chosen constants. Anyway, I'd certainly suggest that whoever embarks on this for real spend some time perusing Jenkins' website. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Wed, Aug 16, 2017 at 12:38 PM, Tom Lane wrote: > Robert Haas writes: >> After some further thought, I propose the following approach to the >> issues raised on this thread: > >> 1. Allow hash functions to have a second, optional support function, >> similar to what we did for btree opclasses in >> c6e3ac11b60ac4a8942ab964252d51c1c0bd8845. The second function will >> have a signature of (opclass_datatype, int64) and should return int64. >> The int64 argument is a salt. When the salt is 0, the low 32 bits of >> the return value should match what the existing hash support function >> returns. Otherwise, the salt should be used to perturb the hash >> calculation. > > +1 Attached is a quick sketch of how this could perhaps be done (ignoring for the moment the relatively-boring opclass pushups). It introduces a new function hash_any_extended which differs from hash_any() in that (a) it combines both b and c into the result and (b) it accepts a seed which is mixed into the initial state if it's non-zero. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company hash-any-extended-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Wed, Aug 16, 2017 at 12:38 PM, Tom Lane wrote: > Robert Haas writes: >> After some further thought, I propose the following approach to the >> issues raised on this thread: > >> 1. Allow hash functions to have a second, optional support function, >> similar to what we did for btree opclasses in >> c6e3ac11b60ac4a8942ab964252d51c1c0bd8845. The second function will >> have a signature of (opclass_datatype, int64) and should return int64. >> The int64 argument is a salt. When the salt is 0, the low 32 bits of >> the return value should match what the existing hash support function >> returns. Otherwise, the salt should be used to perturb the hash >> calculation. > > +1 Cool. >> 2. Introduce a new hash opfamilies here which are more faster, more >> portable, and/or better in other ways than the ones we have today. > > This part seems, uh, under-defined and/or over-ambitious and/or unrelated > to the problem at hand. What are the concrete goals? In my view, it's for the person who proposed a new opclass to say what goals they're trying to satisfy with that opclass. A variety of goals that could justify a new opclass have been proposed upthread -- especially in Jeff Davis's original post (q.v.). Such goals could include endian-ness independence, collation-independence, speed, better bit-mixing, and opfamilies that span more types than our currently ones. These goals are probably mutually exclusive, in the sense that endian-ness independence and collation-independence are probably pulling in the exact opposite direction as speed, so conceivably there could be multiple opclasses proposed with different trade-offs. I take no position for the present on which of those would be worth accepting into core. I agree with you that all of this is basically unrelated to the problem at hand, if "the problem at hand" means "hash partitioning". In my mind, there are two really serious issues that have been raised on that front. One is the problem of hash joins/aggregates/indexes on hash-partitioned tables coming out lopsided, and adding an optional salt will let us fix that problem. The other is that if you migrate your data to a different encoding or endianness, you might have dump/reload problems, but IMHO the already-committed patch for --load-via-partition-root is as much as really *needs* to be done there. I am less skeptical about the idea of endianness-independent hash functions than he is, but "we can't have $FEATURE_MANY_PEOPLE_WANT until we solve $PROBLEM_ANDRES_THINKS_IS_NOT_PRACTICALLY_SOLVABLE" is not a route to swift victory. In short, I'm proposing to add a method to seed the existing hash functions and get 64 bits out of them, and leaving any other potential improvements to someone who wants to argue for them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Robert Haas writes: > After some further thought, I propose the following approach to the > issues raised on this thread: > 1. Allow hash functions to have a second, optional support function, > similar to what we did for btree opclasses in > c6e3ac11b60ac4a8942ab964252d51c1c0bd8845. The second function will > have a signature of (opclass_datatype, int64) and should return int64. > The int64 argument is a salt. When the salt is 0, the low 32 bits of > the return value should match what the existing hash support function > returns. Otherwise, the salt should be used to perturb the hash > calculation. +1 > 2. Introduce a new hash opfamilies here which are more faster, more > portable, and/or better in other ways than the ones we have today. This part seems, uh, under-defined and/or over-ambitious and/or unrelated to the problem at hand. What are the concrete goals? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Thu, Aug 3, 2017 at 6:47 PM, Robert Haas wrote: > That seems pretty lame, although it's sufficient to solve the > immediate problem, and I have to admit to a certain predilection for > things that solve the immediate problem without creating lots of > additional work. After some further thought, I propose the following approach to the issues raised on this thread: 1. Allow hash functions to have a second, optional support function, similar to what we did for btree opclasses in c6e3ac11b60ac4a8942ab964252d51c1c0bd8845. The second function will have a signature of (opclass_datatype, int64) and should return int64. The int64 argument is a salt. When the salt is 0, the low 32 bits of the return value should match what the existing hash support function returns. Otherwise, the salt should be used to perturb the hash calculation. This design kills two birds with one stone: it gives callers a way to get 64-bit hash values if they want them (which should make Tom happy, and we could later think about plugging it into hash indexes) and it gives us a way of turning a single hash function into many (which should allow us to prevent hash indexes or hash tables built on a hash-partitioned table from having a heavily lopsided distribution, and probably will also make people who are interested in topics like Bloom filters happy). 2. Introduce a new hash opfamilies here which are more faster, more portable, and/or better in other ways than the ones we have today. Given our current rather simplistic notion of a "default" opclass, there doesn't seem to be an easy to make whatever we introduce here the default for hash partitioning while keeping the existing default for other purposes. That should probably be fixed at some point. However, given the amount of debate this topic has generated, it also doesn't seem likely that we'd actually wish to decide on a different default in the v11 release cycle, so I don't think there's any rush to figure out exactly how we want to fix it. Focusing on introducing the new opfamilies at all is probably a better use of time, IMHO. Unless anybody strongly objects, I'm going to write a patch for #1 (or convince somebody else to do it) and leave #2 for someone else to tackle if they wish. In addition, I'll tackle (or convince someone else to tackle) the project of adding that second optional support function to every hash opclass in the core repository. Then Amul can update the core hash partitioning patch to use the new infrastructure when it's available and fall back to the existing method when it's not, and I think we'll be in reasonably good shape. Objections to this plan of attack? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Thu, Aug 3, 2017 at 6:08 PM, Andres Freund wrote: >> That's another way to go, but it requires inventing a way to thread >> the IV through the hash opclass interface. > > Only if we really want to do it really well :P. Using a hash_combine() > like > > /* > * Combine two hash values, resulting in another hash value, with decent bit > * mixing. > * > * Similar to boost's hash_combine(). > */ > static inline uint32 > hash_combine(uint32 a, uint32 b) > { > a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2); > return a; > } > > between hash(IV) and the hashfunction should do the trick (the IV needs > to hashed once, otherwise the bit mix is bad). That seems pretty lame, although it's sufficient to solve the immediate problem, and I have to admit to a certain predilection for things that solve the immediate problem without creating lots of additional work. >> We could: >> >> - Invent a new hash_partition AM that doesn't really make indexes but >> supplies hash functions for hash partitioning. >> - Add a new, optional support function 2 to the hash AM that takes a >> value of the type *and* an IV as an argument. >> - Something else. > > Not arguing for it, but one option could also have pg_type.hash* > function(s). True. That is a bit less configurable because you can't then have multiple functions for the same type. Going through the opclass interface means you can have hash_portable_ops and hash_fast_ops and let people choose. But this would be easy to implement and enough for most people in practice. > One thing that I think might be advisable to think about is that we're > atm stuck with a relatively bad hash function for hash indexes (and hash > joins/aggs), and we should probably evolve it at some point. At the same > time there's currently people out there relying on the current hash > functions remaining stable. That to me is a bit of a separate problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 2017-08-03 17:57:37 -0400, Robert Haas wrote: > On Thu, Aug 3, 2017 at 5:50 PM, Andres Freund wrote: > > On 2017-08-03 17:43:44 -0400, Robert Haas wrote: > >> For me, the basic point here is that we need a set of hash functions > >> for hash partitioning that are different than what we use for hash > >> indexes and hash joins -- otherwise when we hash partition a table and > >> create hash indexes on each partition, those indexes will have nasty > >> clustering. Partitionwise hash joins will have similar problems. So, > >> a new set of hash functions specifically for hash partitioning is > >> quite desirable. > > > > Couldn't that just as well solved by being a bit smarter with an IV? I > > doubt we want to end up with different hashfunctions for sharding, > > partitioning, hashjoins (which seems to form a hierarchy). Having a > > working hash-combine function, or even better a hash API that can > > continue to use the hash's internal state, seems a more scalable > > solution. > > That's another way to go, but it requires inventing a way to thread > the IV through the hash opclass interface. Only if we really want to do it really well :P. Using a hash_combine() like /* * Combine two hash values, resulting in another hash value, with decent bit * mixing. * * Similar to boost's hash_combine(). */ static inline uint32 hash_combine(uint32 a, uint32 b) { a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2); return a; } between hash(IV) and the hashfunction should do the trick (the IV needs to hashed once, otherwise the bit mix is bad). > That's actually sort of a > problem anyway. Maybe I ought to have started with the question of > how we're going to make that end of things work. +1 one for that plan. > We could: > > - Invent a new hash_partition AM that doesn't really make indexes but > supplies hash functions for hash partitioning. > - Add a new, optional support function 2 to the hash AM that takes a > value of the type *and* an IV as an argument. > - Something else. Not arguing for it, but one option could also have pg_type.hash* function(s). One thing that I think might be advisable to think about is that we're atm stuck with a relatively bad hash function for hash indexes (and hash joins/aggs), and we should probably evolve it at some point. At the same time there's currently people out there relying on the current hash functions remaining stable. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Thu, Aug 3, 2017 at 5:50 PM, Andres Freund wrote: > On 2017-08-03 17:43:44 -0400, Robert Haas wrote: >> For me, the basic point here is that we need a set of hash functions >> for hash partitioning that are different than what we use for hash >> indexes and hash joins -- otherwise when we hash partition a table and >> create hash indexes on each partition, those indexes will have nasty >> clustering. Partitionwise hash joins will have similar problems. So, >> a new set of hash functions specifically for hash partitioning is >> quite desirable. > > Couldn't that just as well solved by being a bit smarter with an IV? I > doubt we want to end up with different hashfunctions for sharding, > partitioning, hashjoins (which seems to form a hierarchy). Having a > working hash-combine function, or even better a hash API that can > continue to use the hash's internal state, seems a more scalable > solution. That's another way to go, but it requires inventing a way to thread the IV through the hash opclass interface. That's actually sort of a problem anyway. Maybe I ought to have started with the question of how we're going to make that end of things work. We could: - Invent a new hash_partition AM that doesn't really make indexes but supplies hash functions for hash partitioning. - Add a new, optional support function 2 to the hash AM that takes a value of the type *and* an IV as an argument. - Something else. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Thu, Aug 3, 2017 at 5:32 PM, Andres Freund wrote: >> Do you have any feeling for which of those endianness-independent hash >> functions might be a reasonable choice for us? > > Not a strong / very informed one, TBH. > > I'm not convinced it's worth trying to achieve this in the first place, > now that we "nearly" have the insert-via-parent feature. Isn't this a > lot of work, for very little practical gain? Having to select that when > switching architectures still seems unproblematic. People will just > about never switch endianess, so even a tiny performance & effort > overhead doesn't seem worth it to me. I kind of agree with you. There are some advantages of being endian-independent, like maybe your hash partitioning is really across multiple shards, and not all the shards are the same machine architecture, but it's not going to come up for most people. For me, the basic point here is that we need a set of hash functions for hash partitioning that are different than what we use for hash indexes and hash joins -- otherwise when we hash partition a table and create hash indexes on each partition, those indexes will have nasty clustering. Partitionwise hash joins will have similar problems. So, a new set of hash functions specifically for hash partitioning is quite desirable. Given that, if it's not a big problem to pick ones that have the portability properties at least some people want, I'd be inclined to do it. If it results in a noticeable slowdown on macrobenchmarks, then not so much, but otherwise, I'd rather do what people are asking for than spend time arguing about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Hi, On 2017-08-03 17:43:44 -0400, Robert Haas wrote: > For me, the basic point here is that we need a set of hash functions > for hash partitioning that are different than what we use for hash > indexes and hash joins -- otherwise when we hash partition a table and > create hash indexes on each partition, those indexes will have nasty > clustering. Partitionwise hash joins will have similar problems. So, > a new set of hash functions specifically for hash partitioning is > quite desirable. Couldn't that just as well solved by being a bit smarter with an IV? I doubt we want to end up with different hashfunctions for sharding, partitioning, hashjoins (which seems to form a hierarchy). Having a working hash-combine function, or even better a hash API that can continue to use the hash's internal state, seems a more scalable solution. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Hi, On 2017-08-03 17:09:41 -0400, Robert Haas wrote: > On Thu, Jun 1, 2017 at 2:25 PM, Andres Freund wrote: > > Just to clarify: I don't think it's a problem to do so for integers and > > most other simple scalar types. There's plenty hash algorithms that are > > endianess independent, and the rest is just a bit of care. > > Do you have any feeling for which of those endianness-independent hash > functions might be a reasonable choice for us? Not a strong / very informed one, TBH. I'm not convinced it's worth trying to achieve this in the first place, now that we "nearly" have the insert-via-parent feature. Isn't this a lot of work, for very little practical gain? Having to select that when switching architectures still seems unproblematic. People will just about never switch endianess, so even a tiny performance & effort overhead doesn't seem worth it to me. Leaving that aside: > https://github.com/markokr/pghashlib implements various hash functions > for PostgreSQL, and claims that, of those implemented, crc32, Jenkins, > lookup3be and lookup3le, md5, and siphash24 are endian-independent. > An interesting point here is that Jeff Davis asserted in the original > post on this thread that our existing hash_any() wasn't portable, but > our current hash_any seems to be the Jenkins algorithm -- so I'm > confused. Part of the problem seems to be that, according to > https://en.wikipedia.org/wiki/Jenkins_hash_function there are 4 of > those. I don't know whether the one in pghashlib is the same one > we've implemented. IIUC lookup3be/le from Marko's hashlib just has a endianess conversion added. I'd personally not go for jenkins3, it's not particularly fast, nor does it balance that out w/ being cryptographicaly secure. > Kennel Marshall suggested xxhash as an endian-independent algorithm > upthread. Code for that is available under a 2-clause BSD license. Yea, that'd have been one of my suggestions too. Especially as I still want to implement better compression using lz4, and that'll depend on xxhash in turn. > PostgreSQL page checksums use an algorithm based on, but not exactly, > FNV-1a. See storage/checksum_impl.h. The comments there say this > algorithm was chosen with speed in mind. Our version is not > endian-independent because it folds in 4-byte integers rather than > 1-byte integers, but plain old FNV-1a *is* endian-independent and > could be used. Non-SIMDed (which we hope to achieve with our implementation, which is why we have separate compiler flags for that file) implementations of FNV are, to my knowledge, not particularly fast. And the SIMD tricks are, to my knowledge, not really applicable to the case at hand here. So I'm not a fan of choosing FNV. > We also have an implementation of CRC32C in core - see port/pg_crc32.h > and port/pg_crc32c_sb8.c. It's not clear to me whether this is > Endian-independent or not, although there is stuff that depends on > WORDS_BIGENDIAN, so, uh, maybe? The results should be endian independent. It depends on WORDS_BIGENDIAN because we need different pre-computed tables depending on endianess. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Thu, Jun 1, 2017 at 2:25 PM, Andres Freund wrote: > Just to clarify: I don't think it's a problem to do so for integers and > most other simple scalar types. There's plenty hash algorithms that are > endianess independent, and the rest is just a bit of care. Do you have any feeling for which of those endianness-independent hash functions might be a reasonable choice for us? https://github.com/markokr/pghashlib implements various hash functions for PostgreSQL, and claims that, of those implemented, crc32, Jenkins, lookup3be and lookup3le, md5, and siphash24 are endian-independent. An interesting point here is that Jeff Davis asserted in the original post on this thread that our existing hash_any() wasn't portable, but our current hash_any seems to be the Jenkins algorithm -- so I'm confused. Part of the problem seems to be that, according to https://en.wikipedia.org/wiki/Jenkins_hash_function there are 4 of those. I don't know whether the one in pghashlib is the same one we've implemented. Kennel Marshall suggested xxhash as an endian-independent algorithm upthread. Code for that is available under a 2-clause BSD license. PostgreSQL page checksums use an algorithm based on, but not exactly, FNV-1a. See storage/checksum_impl.h. The comments there say this algorithm was chosen with speed in mind. Our version is not endian-independent because it folds in 4-byte integers rather than 1-byte integers, but plain old FNV-1a *is* endian-independent and could be used. We also have an implementation of CRC32C in core - see port/pg_crc32.h and port/pg_crc32c_sb8.c. It's not clear to me whether this is Endian-independent or not, although there is stuff that depends on WORDS_BIGENDIAN, so, uh, maybe? Some other possibly-interesting links: https://research.neustar.biz/2011/12/29/choosing-a-good-hash-function-part-2/ http://greenrobot.org/essentials/features/performant-hash-functions-for-java/comparison-of-hash-functions/ https://www.strchr.com/hash_functions Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, Jun 2, 2017 at 10:19 AM, Joe Conway wrote: >> Yeah, that's not crazy. I find it a bit surprising in terms of the >> semantics, though. SET >> when_i_try_to_insert_into_a_specific_partition_i_dont_really_mean_it = >> true? > > Maybe > SET partition_tuple_retry = true; > -or- > SET partition_tuple_reroute = true; > ? > > I like the idea of only rerouting when failing constraints although I > can envision where there might be use cases where you essentially want > to re-partition and therefore reroute everything, leading to: > > SET partition_tuple_reroute = (none | error | all); Personally, I think it's more elegant to make this a pg_dump option than to make it a server GUC, but I'm not going to spend time fighting the server GUC idea if other people like it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 06/02/2017 05:47 AM, Robert Haas wrote: > On Fri, Jun 2, 2017 at 1:24 AM, Jeff Davis wrote: >> 2. I basically see two approaches to solve the problem: >> (a) Tom suggested at PGCon that we could have a GUC that >> automatically causes inserts to the partition to be re-routed through >> the parent. We could discuss whether to always route through the >> parent, or do a recheck on the partition constrains and only reroute >> tuples that will fail it. If the user gets into trouble, the worst >> that would happen is a helpful error message telling them to set the >> GUC. I like this idea. > > Yeah, that's not crazy. I find it a bit surprising in terms of the > semantics, though. SET > when_i_try_to_insert_into_a_specific_partition_i_dont_really_mean_it = > true? Maybe SET partition_tuple_retry = true; -or- SET partition_tuple_reroute = true; ? I like the idea of only rerouting when failing constraints although I can envision where there might be use cases where you essentially want to re-partition and therefore reroute everything, leading to: SET partition_tuple_reroute = (none | error | all); Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Hash Functions
On Fri, Jun 2, 2017 at 1:24 AM, Jeff Davis wrote: > 1. For range partitioning, I think it's "yes, a little". As you point > out, there are already some weird edge cases -- the main way range > partitioning would make the problem worse is simply by having more > users. I agree. > But for hash partitioning I think the problems will become more > substantial. Different encodings, endian issues, etc. will be a > headache for someone, and potentially a bad day if they are urgently > trying to restore on a new machine. We should remember that not > everyone is a full-time postgres DBA, and users might reasonably think > that the default options to pg_dump[all] will give them a portable > dump. I agree to an extent. I think the problem will be worse for hash partitioning but I might disagree with you on how much worse. I think that most people don't do encoding conversions very often, and that those who do know (or should know) enough to expect trouble. I think most people do endian-ness conversions almost never, but since that's a matter of hardware not configuration I'd like to paper over that case if we can. > 2. I basically see two approaches to solve the problem: > (a) Tom suggested at PGCon that we could have a GUC that > automatically causes inserts to the partition to be re-routed through > the parent. We could discuss whether to always route through the > parent, or do a recheck on the partition constrains and only reroute > tuples that will fail it. If the user gets into trouble, the worst > that would happen is a helpful error message telling them to set the > GUC. I like this idea. Yeah, that's not crazy. I find it a bit surprising in terms of the semantics, though. SET when_i_try_to_insert_into_a_specific_partition_i_dont_really_mean_it = true? > (b) I had suggested before that we could make the default text dump > (and the default output from pg_restore, for consistency) route > through the parent. Advanced users would dump with -Fc, and pg_restore > would support an option to do partition-wise loading. To me, this is > simpler, but users might forget to use (or not know about) the > pg_restore option and then it would load more slowly. Also, the ship > is sailing on range partitioning, so we might prefer option (a) just > to avoid making any changes. I think this is a non-starter. The contents of the dump shouldn't depend on the format chosen; that is bound to confuse somebody. I also do not wish to inflict a speed penalty on the users of plain-format dumps. >> 2. Add an option like --dump-partition-data-with-parent. I'm not sure >> who originally proposed this, but it seems that everybody likes it. >> What we disagree about is the degree to which it's sufficient. Jeff >> Davis thinks it doesn't go far enough: what if you have an old >> plain-format dump that you don't want to hand-edit, and the source >> database is no longer available? Most people involved in the >> unconference discussion of partitioning at PGCon seemed to feel that >> wasn't really something we should be worry about too much. I had been >> taking that position also, more or less because I don't see that there >> are better alternatives. > > If the suggestions above are unacceptable, and we don't come up with > anything better, then of course we have to move on. I am worrying now > primarily because now is the best time to worry; I don't expect any > horrible outcome. OK. >> 3. Implement portable hash functions (Jeff Davis or me, not sure >> which). Andres scoffed at this idea, but I still think it might have >> legs. > > I think it reduces the problem, which has value, but it's hard to make > it rock-solid. I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Thu, Jun 1, 2017 at 11:25 AM, Andres Freund wrote: > Secondly, I think that's to a significant degree caused by > the fact that in practice people way more often partition on types like > int4/int8/date/timestamp/uuid rather than text - there's rarely good > reasons to do the latter. Once we support more pushdowns to partitions, the only question is: what are your join keys and what are your grouping keys? Text is absolutely a normal join key or group key. Consider joins on a user ID or grouping by a model number. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Thu, Jun 1, 2017 at 10:59 AM, Robert Haas wrote: > 1. Are the new problems worse than the old ones? > > 2. What could we do about it? Exactly the right questions. 1. For range partitioning, I think it's "yes, a little". As you point out, there are already some weird edge cases -- the main way range partitioning would make the problem worse is simply by having more users. But for hash partitioning I think the problems will become more substantial. Different encodings, endian issues, etc. will be a headache for someone, and potentially a bad day if they are urgently trying to restore on a new machine. We should remember that not everyone is a full-time postgres DBA, and users might reasonably think that the default options to pg_dump[all] will give them a portable dump. 2. I basically see two approaches to solve the problem: (a) Tom suggested at PGCon that we could have a GUC that automatically causes inserts to the partition to be re-routed through the parent. We could discuss whether to always route through the parent, or do a recheck on the partition constrains and only reroute tuples that will fail it. If the user gets into trouble, the worst that would happen is a helpful error message telling them to set the GUC. I like this idea. (b) I had suggested before that we could make the default text dump (and the default output from pg_restore, for consistency) route through the parent. Advanced users would dump with -Fc, and pg_restore would support an option to do partition-wise loading. To me, this is simpler, but users might forget to use (or not know about) the pg_restore option and then it would load more slowly. Also, the ship is sailing on range partitioning, so we might prefer option (a) just to avoid making any changes. I am fine with either option. > 2. Add an option like --dump-partition-data-with-parent. I'm not sure > who originally proposed this, but it seems that everybody likes it. > What we disagree about is the degree to which it's sufficient. Jeff > Davis thinks it doesn't go far enough: what if you have an old > plain-format dump that you don't want to hand-edit, and the source > database is no longer available? Most people involved in the > unconference discussion of partitioning at PGCon seemed to feel that > wasn't really something we should be worry about too much. I had been > taking that position also, more or less because I don't see that there > are better alternatives. If the suggestions above are unacceptable, and we don't come up with anything better, then of course we have to move on. I am worrying now primarily because now is the best time to worry; I don't expect any horrible outcome. > 3. Implement portable hash functions (Jeff Davis or me, not sure > which). Andres scoffed at this idea, but I still think it might have > legs. I think it reduces the problem, which has value, but it's hard to make it rock-solid. > make fast. Those two things also solve different parts of the > problem; one is insulating the user from a difference in hardware > architecture, while the other is insulating the user from a difference > in user-selected settings. I think that the first of those things is > more important than the second, because it's easier to change your > settings than it is to change your hardware. Good point. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 06/01/2017 11:25 AM, Andres Freund wrote: > On 2017-06-01 13:59:42 -0400, Robert Haas wrote: >> My personal guess is that most people will prefer the fast >> hash functions over the ones that solve their potential future >> migration problems, but, hey, options are good. > > I'm pretty sure that will be the case. I'm not sure that adding > infrastructure to allow for something that nobody will use in practice > is a good idea. If there ends up being demand for it, we can still go there. > > I think that the number of people that migrate between architectures is > low enough that this isn't going to be a very common issue. Having some > feasible way around this is important, but I don't think we should > optimize heavily for it by developing new infrastructure / complicating > experience for the 'normal' uses. +1 Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Hash Functions
On 2017-06-01 13:59:42 -0400, Robert Haas wrote: > I'm not actually aware of an instance where this has bitten anyone, > even though it seems like it certainly could have and maybe should've > gotten somebody at some point. Has anyone else? Two comments: First, citus has been doing hash-partitiong and append/range partitioning for a while now, and I'm not aware of anyone being bitten by this (although there've been plenty other things ;)), even though there've been cases upgrading to different collation & encodings. Secondly, I think that's to a significant degree caused by the fact that in practice people way more often partition on types like int4/int8/date/timestamp/uuid rather than text - there's rarely good reasons to do the latter. > Furthermore, neither range nor list partitioning depends on properties > of the hardware, like how wide integers are, or whether they are > stored big-endian. A naive approach to hash partitioning would depend > on those things. That's clearly worse. I don't think our current int4/8 hash functions depend on FLOAT8PASSBYVAL. > 3. Implement portable hash functions (Jeff Davis or me, not sure > which). Andres scoffed at this idea, but I still think it might have > legs. Coming up with a hashing algorithm for integers that produces > the same results on big-endian and little-endian systems seems pretty > feasible, even with the additional constraint that it should still be > fast. Just to clarify: I don't think it's a problem to do so for integers and most other simple scalar types. There's plenty hash algorithms that are endianess independent, and the rest is just a bit of care. Where I see a lot more issues is doing so for more complex types like arrays, jsonb, postgis geometry/geography types and the like, where the fast and simple implementation is to just hash the entire datum - and that'll very commonly not be portable at all due to padding and type wideness differences. > My personal guess is that most people will prefer the fast > hash functions over the ones that solve their potential future > migration problems, but, hey, options are good. I'm pretty sure that will be the case. I'm not sure that adding infrastructure to allow for something that nobody will use in practice is a good idea. If there ends up being demand for it, we can still go there. I think that the number of people that migrate between architectures is low enough that this isn't going to be a very common issue. Having some feasible way around this is important, but I don't think we should optimize heavily for it by developing new infrastructure / complicating experience for the 'normal' uses. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 1:35 PM, Joe Conway wrote: >> That's a good point, but the flip side is that, if we don't have >> such a rule, a pg_dump of a hash-partitioned table on one >> architecture might fail to restore on another architecture. Today, I >> believe that, while the actual database cluster is >> architecture-dependent, a pg_dump is architecture-independent. Is it >> OK to lose that property? > > Not from where I sit. It was pointed out at PGCon that we've actually already crossed this Rubicon. If you create a range-partitioned table, put a bunch of data into it, and then try to reload it on another system with a different set of encoding definitions, the proper partition for some row might be different. That would result in the dump failing to reload with a complaint about the partition key being violated. And, in fact, you could have the exact same issue on earlier releases which don't have partitioning, because a CHECK constraint of the form (a >= 'something' AND b < 'something else') could be true under one encoding and false under another, and you could define such a constraint on any release (from this millienium, anyway). I'm not actually aware of an instance where this has bitten anyone, even though it seems like it certainly could have and maybe should've gotten somebody at some point. Has anyone else? I think it's a reasonable guess that such problems will become more common with the advent of partitioning and more common still as we continue to improve partitioning, because people who otherwise would have given up on PostgreSQL -- or at least on partitioning -- will actually try to use it in earnest and then hit this problem. However, my guess is that it will still be pretty rare, and that having an optional --dump-partition-data-with-parent flag that can be used when it crops up will be an adequate workaround for most people. Of course, that is just my opinion. So now I think that the right way to think about the issues around hash partitioning is as a new variant of a problem that we already have rather than an altogether new problem. IMHO, the right questions are: 1. Are the new problems worse than the old ones? 2. What could we do about it? On #1, I'd say tentatively yes. The problem of portability across encodings is probably not new. Suppose you have a table partitioned by range, either using the new range partitioning or using the old table inheritance method and CHECK constraints. If you move that table to a different encoding, will the collation behavior you get under the new encoding match the collation behavior you got under the old encoding? The documentation says: "Also, a collation is tied to a character set encoding (see Section 22.3). The same collation name may exist for different encodings", which makes it sound like it is possible but not guaranteed. Even if the same collation name happens to exist, there's no guarantee it behaves the same way under the new encoding, and given our experiences with glibc so far, I'd bet against it. If glibc doesn't even think strcoll() and strxfrm() need to agree with each other for the same collation, or that minor releases shouldn't whack the behavior around, there doesn't seem to be room for optimism about the possibility that they carefully preserve behavior across similarly-named collations on different encodings. On the other hand, collation rules probably tend to vary only around the edges, so there's a reasonably good chance that even if the collation rules change when you switch encodings, every row will still get put into the same partition as before. If we implement hashing for hash partitioning in some trivial way like hashing the raw bytes, that will most certainly not be true -- *most* rows will move to a different partition when you switch encodings. Furthermore, neither range nor list partitioning depends on properties of the hardware, like how wide integers are, or whether they are stored big-endian. A naive approach to hash partitioning would depend on those things. That's clearly worse. On #2, I'll attempt to list the approaches that have been proposed so far: 1. Don't implement hash partitioning (Tom Lane). I don't think this proposal will attract many votes. 2. Add an option like --dump-partition-data-with-parent. I'm not sure who originally proposed this, but it seems that everybody likes it. What we disagree about is the degree to which it's sufficient. Jeff Davis thinks it doesn't go far enough: what if you have an old plain-format dump that you don't want to hand-edit, and the source database is no longer available? Most people involved in the unconference discussion of partitioning at PGCon seemed to feel that wasn't really something we should be worry about too much. I had been taking that position also, more or less because I don't see that there are better alternatives. For instance, Jeff proposed having the COPY command specify both the parent and the child and provid
Re: [HACKERS] Hash Functions
On Fri, May 19, 2017 at 2:36 AM, Jeff Davis wrote: > I could agree to something like that. Let's explore some of the challenges > there and potential solutions: > > 1. Dump/reload of hash partitioned data. > > Falling back to restore-through-the-root seems like a reasonable answer > here. Moving to a different encoding is not an edge case, but it's not > common either, so a performance penalty seems acceptable. I'm not > immediately sure how we'd implement this in pg_dump/restore, so I'd feel a > little more comfortable if I saw a sketch. Right, I think this needs some investigation. I can't whip up a sketch on short notice, but I'll see if someone else at EnterpriseDB can work on it unless somebody else wants to take a crack at it. > 2. Having a lot of hash partitions would be cumbersome > > The user would need to create and manage each partition, and try to do > global operations in a sane way. The normal case would probably involve > scripts to do things like add an index to all partitions, or a column. Many > partitions would also just pollute the namespace unless you remember to put > them in a separate schema (yes, it's easy, but most people will still > forget). Some syntax sugar would go a long way here. I agree. Adding a column already cascades to all children, and there's a proposal to make the same thing true for indexes. See discussion beginning at http://postgr.es/m/c8fe4f6b-ff46-aae0-89e3-e936a35f0...@postgrespro.ru I haven't had time to review the code posted there yet, but I would like to see something along the lines discussed there committed to v11, and hopefully also something around foreign keys. It should be possible to create an outbound foreign key on a foreign table and have that cascade to all children. Conversely, it should also be possible to create a foreign key referencing a partitioned table provided that the foreign key references the partitioning key, and that there's a unique index on those same columns on every partition. (Referencing a foreign key that is not the partitioning key will have to wait for global indexes, I think.) These things are useful not just for hash partitioning, but also for list and range partitioning, and we'll save a lot of work if we can use the same infrastructure for both cases. > 3. The user would need to specify details they really don't care about for > each partition. > > Things like "modulus 16, remainder 0", "modulus 16, remainder 1" are tedious > boilerplate. And if the user makes a mistake, then 1/16 of inserts start > failing. Probably would be caught during testing, but not exactly a good > user experience. I'm not thrilled about this, considering that all the user > really wants is 16 partitions, but it's not the end of the world. As I said on the hash partitioning thread, I think the way to handle this is to get the basic feature in first, then add some convenience syntax to auto-create the partitions. That shouldn't be very difficult to implement; I just didn't want to complicate things more than necessary for the first version. The same issue arose when discussing list and range partitioning: Oracle has syntax like ours, but with the ability (requirement?) to create the partitions in the initial CREATE TABLE statement. However, I wanted to make sure we had the "inconvenient" syntax fully working and fully tested before we added that, because I believe pg_dump needs to dump everything out the long way. > 4. Detach is a foot-gun > > If you detach a partition, random inserts will start failing. Not thrilled > about this, but a hapless user would accept most of the blame if they > stumble over it. Another way of saying this is with hash partitioning you > really need the whole set for the table to be online at all. But we can't > really enforce that, because it would limit some of the flexibility that you > have in mind. Yes, I agree with all of that. I don't think it's really going to be a problem in practice. The only reason to detach a hash partition is if you want to split it, and we may eventually have convenience syntax to do that in an automated (i.e. less error-prone) way. If somebody does it manually and without a plan for putting back a replacement partition, they may be sad, but if somebody puts a CHECK (false) constraint on their table, they may be sad about that, too. It's more important to allow for flexibility than to prohibit every stupid thing somebody might try to do. Also, documentation helps. We've got a chapter on partitioning and it can be expanded to discuss these kinds of issues. > Stepping back, your approach might be closer to the general postgres > philosophy of allowing the user to assemble from spare parts first, then a > few releases later we offer some pre-built subassemblies, and a few releases > later we make the typical cases work out of the box. I'm fine with it as > long as we don't paint ourselves into a corner. That's basically my thinking here. Right now, our partitioni
Re: [HACKERS] Hash Functions
On Thu, May 18, 2017 at 1:53 AM, Jeff Davis wrote: > For instance, it makes little sense to have individual check > constraints, indexes, permissions, etc. on a hash-partitioned table. > It doesn't mean that we should necessarily forbid them, but it should > make us question whether combining range and hash partitions is really > the right design. I think that it definitely makes sense to have individual indexes on a hash-partitioned table. If you didn't, then as things stand today, you'd have no indexes at all, which can't be good. In the future, we might have some system where an index created on the parent cascades down to all of the children, but even then, you might want to REINDEX just one of those child indexes, or better yet, create a replacement index concurrently and then drop the old one concurrently. You might also want to add the same sort of new index to every partition, but not in a single operation - for reasons of load, length of maintenance window, time for which a snapshot is held open, etc. I agree that separate constraints and permissions on hash partitions don't make much sense. To a lesser extent, that's true of other kinds of partitioning as well. I mean, there is probably some use case for setting separate permissions on a range-partitioned table, but it's a pretty thin use case. It certainly seems possible that many users would prefer a rule that enforces uniform permissions across the entire partitioning hierarchy. This is one of the key things that had to be decided in regard to the partitioning implementation we now have: for which things should we enforce uniformity, and for which things should we allow diversity? I advocated for enforcing uniformity only in areas where we could see a clear advantage to it, which led to the fairly minimal approach of enforcing only that we had no multiple inheritance and no extra columns in the children, but that's certainly an arguable position. Other people argued for more restrictions, I believe out of a desire to create more administrative simplicity, but there is a risk of cutting yourself off from useful configurations there, and it seems very difficult to me to draw a hard line between what is useful and what is useless. For example, consider a hash-partitioned table. Could it make sense to have some but not all partitions be unlogged? I think it could. Suppose you have a cluster of machines each of which has a replica of the same hash-partitioned table. Each server uses logged tables for the partitions for which it is the authoritative source of information, and unlogged tables for the others. In the event of crash, the data for any tables that are lost are replicated from the master for that machine. I can think of some disadvantages of that design, but I can think of some advantages, too, and I think it's pretty hard to say that nobody should ever want to do it. And if it's legitimate to want to do that, then what if I want to use trigger-based replication rather than logical replication? Then I might need triggers on some partitions but not all, or maybe different triggers on different partitions. Even for a permissions grant, suppose my production system is having some problem that can't be replicated on the test data set. Is it reasonable to want to give a trusted developer access to a slice, but not all of, my production data? I could allow them access to just one partition. Maybe not a common desire, but is that enough reason to ban it? I'd say it's arguable. I don't think that there are bright lines around any of this stuff. My experience with this area has led me to give up on the idea of complete uniformity as impractical, and instead look at it from the perspective of "what do we absolutely have to ban in order for this to be sane?". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Wed, May 17, 2017 at 11:35 AM, Tom Lane wrote: > I think the question is whether we are going to make a distinction between > logical partitions (where the data division rule makes some sense to the > user) and physical partitions (where it needn't). I think it might be > perfectly reasonable for those to behave differently. Agreed. To summarize my perspective: * hash partitioning offers a nice way to divide the data for later processing by parallel query * range partitioning is good for partition elimination (constraint_exclusion) and separating hot/cold data (e.g. partitioning on date) * both offer some maintenance benefits (e.g. reindex one partition at a time), though range partitioning seems like it offers better flexibility here in some cases I lean toward separating the concepts, but Robert is making some reasonable arguments and I could be convinced. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Wed, May 17, 2017 at 12:10 PM, Robert Haas wrote: > 1. To handle dump-and-reload the way we partitioning does today, hash > functions would need to be portable across encodings. > 2. That's impractically difficult. > 3. So let's always load data through the top-parent. > 4. But that could fail due to e.g. a UNIQUE index on an individual > child, so let's try to prohibit all of the things that could be done > to an individual partition that could cause a reload failure. > 5. And then for good measure let's hide the existence of the partitions, too. That is one thread of logic, but out of the discussion also highlighted some of the consequences of treating hash partitions like range/list partitions. For instance, it makes little sense to have individual check constraints, indexes, permissions, etc. on a hash-partitioned table. It doesn't mean that we should necessarily forbid them, but it should make us question whether combining range and hash partitions is really the right design. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Wed, May 17, 2017 at 2:35 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, May 16, 2017 at 4:25 PM, Jeff Davis wrote: >>> Why can't hash partitions be stored in tables the same way as we do TOAST? >>> That should take care of the naming problem. > >> Hmm, yeah, something like that could be done, but every place where >> you are currently allowed to refer to a partition by name would have >> to be be changed to accept some other syntax for specifying the >> partition. > > Uh ... toast tables have regular names, and can be specified in commands > just like any other table. I don't see why these "auto" partition tables > couldn't be handled the same way. Really? That seems like a huge usability fail to me. If somebody wants to create an index on one partition of a hash-partitioned table, or reindex an index, do you really want them to have to dig out an internal name to do it? And how exactly would you dump and restore the partitions and their indexes? It's true that there are some operations that can be performed directly on a TOAST table, but the saving grace is that you usually don't need to do any of them. That won't be true here. >> Beyond that, I think it's a bad idea to make hash partitions behave >> completely differently from list and range partitions. > > I think the question is whether we are going to make a distinction between > logical partitions (where the data division rule makes some sense to the > user) and physical partitions (where it needn't). I think it might be > perfectly reasonable for those to behave differently. I don't think I'd like to go so far as to say that it's unreasonable, but I certainly wouldn't say I'm optimistic about such a design. I do not think that it is going to work to conceal from the user that the partitions are really separate tables with their own indexes. I also think that trying to make such a thing work is just going to lead to a lot of time and energy spent trying to paper over problems that are basically self-inflicted, and that papering over those problems won't really end up having any value for users. Remember, the chain of reasoning here is something like: 1. To handle dump-and-reload the way we partitioning does today, hash functions would need to be portable across encodings. 2. That's impractically difficult. 3. So let's always load data through the top-parent. 4. But that could fail due to e.g. a UNIQUE index on an individual child, so let's try to prohibit all of the things that could be done to an individual partition that could cause a reload failure. 5. And then for good measure let's hide the existence of the partitions, too. Every step in that chain of logic has a certain sense to it, but none of them are exactly water-tight. #1 is basically a value judgement: would people rather (a) have faster hash functions, or (b) would they rather be able to port a database to a different encoding without having rows move between hash functions? The statement is only true if you think it's the latter, but I tend to think it's the former. #2 is a judgement that the performance characteristics of as-yet-unwritten portable hashing will be so bad that nobody could possibly be satisfied with it. #3 is a great idea as an optional behavior, but it's only a strict necessity if you're totally committed to #1 and #2. It also has some performance cost, which makes it somewhat undesirable as a default behavior. #4 is *probably* a necessary consequence of #3. I don't know what the argument for #5 is unless it's that #4 isn't hard enough already. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Robert Haas writes: > On Tue, May 16, 2017 at 4:25 PM, Jeff Davis wrote: >> Why can't hash partitions be stored in tables the same way as we do TOAST? >> That should take care of the naming problem. > Hmm, yeah, something like that could be done, but every place where > you are currently allowed to refer to a partition by name would have > to be be changed to accept some other syntax for specifying the > partition. Uh ... toast tables have regular names, and can be specified in commands just like any other table. I don't see why these "auto" partition tables couldn't be handled the same way. > Beyond that, I think it's a bad idea to make hash partitions behave > completely differently from list and range partitions. I think the question is whether we are going to make a distinction between logical partitions (where the data division rule makes some sense to the user) and physical partitions (where it needn't). I think it might be perfectly reasonable for those to behave differently. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Tue, May 16, 2017 at 4:25 PM, Jeff Davis wrote: > Why can't hash partitions be stored in tables the same way as we do TOAST? > That should take care of the naming problem. Hmm, yeah, something like that could be done, but every place where you are currently allowed to refer to a partition by name would have to be be changed to accept some other syntax for specifying the partition. Even with all the things you propose to disallow, things like CLUSTER, VACUUM, ANALYZE, etc. would still have to be accepted on individual partitions. I suspect even CREATE INDEX and DROP INDEX would need to be accepted on individual partitions, because an index on one partition somehow becomes bloated while the corresponding indexes on other partitions are OK, you'll want to create a replacement index concurrently and drop the old one. Of course, for similar reasons, you'd need some way for \d on the parent to display information on indexes on all the children, and all of that output would have to frobbed to use whatever syntax is now required, in lieu of a name, to use an individual partition. Error messages would have to be adjusted in probably quite a few places to use the new notation, too. And on and on. It's not impossible to do, but we could end up chasing down loose ends for a very long time. Beyond that, I think it's a bad idea to make hash partitions behave completely differently from list and range partitions. That's a lot of code extra code to maintain, and a lot of extra notional complexity for users, for really not a lot of benefit. I think we're taking a significant but not overwhelming problem -- our current hash functions aren't portable -- and through a chain of logical links eventually ending up with the conclusion that the design of partitioning needs to be totally overhauled. I want to resist that conclusion. I'm not saying that the problem isn't a problem, or that there's not some logic to each step in the chain, but it's not that hard to blow a small problem up into a huge one by assuming the worst possible consequences or the tightest possible requirements at each step. http://tvtropes.org/pmwiki/pmwiki.php/Main/ForWantOfANail is not an argument for stricter regulation of the blacksmith industry. >> If Java has portable hash functions, why can't we? > > Java standardizes on a particular unicode encoding (utf-16). Are you > suggesting that we do the same? Or is there another solution that I am > missing? Well, I've already said several times (and Peter Eisentraut has agreed) that we don't really need the hash functions to be portable across encodings. I think there are at least three good arguments for that position. First, as Peter Geoghegan points out, the word is increasingly standardizing on Unicode, and that trend seems likely to continue. I strongly suspect that UTF-8 is the most common database encoding by a wide margin. There may occasionally be reasons to avoid it if, for example, you're using one of the Eastern languages that doesn't play entirely nicely with UTF-8, or if you happen to be storing a large number of characters that can be represented in a single byte in some other encoding but which require multiple bytes in UTF-8, but for an awful lot of people UTF-8 just works and there's no need to think any further. So, a lot of people will never hit the problem of needing to migrate a database between encodings because they'll just use UTF-8. Second, if the previous argument turns out to be wrong and the world abandons UTF-8 in favor of some new and better system (UTF-9?), or if users frequently want to make some other encoding conversion like Tom's original example of LATIN1 -> UTF-8, we've already got a proposed workaround for that case which seems like it will work just fine. Just dump your data with pg_dump --insert-hash-partitioned-data-into-parent and reload on the new system. This isn't absolutely guaranteed to work if you've done something silly that will make the load of a particular row work on one partition and fail on some other one, but you probably won't do that because it would be dumb. Also, it will be a bit slower than a regular dump-and-reload cycle because tuple routing isn't free. Neither of these problems really sound very bad. If we're going to start fixing things that could cause database migrations/upgrades to occasionally fail in corner cases or run more slowly than expected, there's a long list of things that you can do in your DDL that will make pg_upgrade bomb out, and many of them are things that bite users with some regularity (e.g. tablespaces inside the data directory or other tablespaces, dependencies on system objects that are changed in the new version, ALTER USER .. SET ROLE). For whatever reason, we haven't viewed those warts as really high-priority items in need of fixing; in some cases, instead of actually trying to improve usability, we've all but mocked the people reporting those issues for having the temerity to do configur
Re: [HACKERS] Hash Functions
On Tue, May 16, 2017 at 8:40 PM, Jeff Davis wrote: > On Mon, May 15, 2017 at 1:04 PM, David Fetter wrote: >> As the discussion has devolved here, it appears that there are, at >> least conceptually, two fundamentally different classes of partition: >> public, which is to say meaningful to DB clients, and "private", used >> for optimizations, but otherwise opaque to DB clients. >> >> Mashing those two cases together appears to cause more problems than >> it solves. > > I concur at this point. I originally thought hash functions might be > made portable, but I think Tom and Andres showed that to be too > problematic -- the issue with different encodings is the real killer. > > But I also believe hash partitioning is important and we shouldn't > give up on it yet. > > That means we need to have a concept of hash partitions that's > different from range/list partitioning. The terminology > "public"/"private" does not seem appropriate. Logical/physical or > external/internal might be better. > > With hash partitioning: > * User only specifies number of partitions of the parent table; does > not specify individual partition properties (modulus, etc.) a well distributed integer column doesn't even need to be hashed, a simple modulo works with it. If we are going towards "implicit" (yet another name) partitioning, we could choose the strategy based on the data type of the partition key, not just hash it always. Although, we might end up hashing it in most of the cases. > * Dump/reload goes through the parent table (though we may provide > options so pg_dump/restore can optimize this) Probably you imply immediate hash partitioned parent, but just let me clarify it a bit. We support multi-level partitioning with each partitioned table anywhere in the partitioning hierarchy choosing any partitioning scheme. So, we can have range partitioned table as a partition of a hash partitioned table or for that matter, a non-hash partitioned table which is somewhere in the hiearchy rooted at the hash partitioned table. So, for range/list even hash partitions that are grand-children of a hash partitioned table, we will need to route dump/reload through that hash partitioned table i.e. route it through the topmost hash-partitioned table. > * We could provide syntax to adjust the number of partitions, which > would be expensive but still useful sometimes. > * All DDL should be on the parent table, including check constraints, > FKs, unique constraints, exclusion constraints, indexes, etc. i.e. topmost hash partitioned table as explained above. > - Unique and exclusion constraints would only be permitted if the > keys are a superset of the partition keys. Do you think this constraint apply even after we support global indexes? Isn't this applicable to all the partitioning strategies? > - FKs would only be permitted if the two table's partition schemes > match and the keys are members of the same hash opfamily (this could > be relaxed slightly, but it gets a little confusing if so) > * No attach/detach of partitions It will be good, if we can support this for maintenance purpose. If a partition goes bad, we could replace it with its copy somewhere, using attach and detach without affecting the whole table. Now does that mean, that we will need to support some form of pg_dump/copy with special flag to create copies of individual partitions? I think that proposal has already been floated. > * All partitions have the same permissions Why's that? > > The only real downside is that it could surprise users -- why can I > add a CHECK constraint on my range-partitioned table but not the > hash-partitioned one? We should try to document this so users don't > find that out too far along. As long as they aren't surprised, I think > users will understand why these aren't quite the same concepts. > For a transparent hash (non-transparent in the sense of what Mark Dilger proposed), any constraint other than implicit partitioning constraint is applicable to the whole table or it's not applicable at all. So, better if user adds it on the parent hash table. So, yes, with this reasoning, we could document this fact. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 2017/05/17 5:25, Jeff Davis wrote: > On Tuesday, May 16, 2017, Robert Haas wrote: >> I don't really find this a very practical design. If the table >> partitions are spread across different relfilenodes, then those >> relfilenodes have to have separate pg_class entries and separate >> indexes, and those indexes also need to have separate pg_class >> entries. Otherwise, nothing works. And if they do have separate >> pg_class entries, then the partitions have to have their own names, >> and likewise for their indexes, and a dump-and-reload has to preserve >> those names. If it doesn't, and those objects get new system-assigned >> names after the dump-and-reload, then dump restoration can fail when a >> system-assigned name collides with an existing name that is first >> mentioned later in the dump. > > Why can't hash partitions be stored in tables the same way as we do TOAST? > That should take care of the naming problem. There is only one TOAST table per relation though, containing all of the relation's TOASTed data. Doesn't the multiplicity of hash partitions pose a problem? Will a hash partition of given name end up with the same subset of data in the target database as it did in the source database? I suppose it won't matter though if we make hash partitions an implementation detail of hash partitioned tables to the extent you described in your earlier email [1], whereby it doesn't matter to an application which partition contains what subset of the table's data. Thanks, Amit [1] https://www.postgresql.org/message-id/CAMp0ubcQ3VYdU1kNUCOmpj225U4hk6ZEoaUVeReP8h60p%2Bmv1Q%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 5/16/17 11:10, Jeff Davis wrote: > I concur at this point. I originally thought hash functions might be > made portable, but I think Tom and Andres showed that to be too > problematic -- the issue with different encodings is the real killer. I think it would be OK that if you want to move a hash-partitioned table to a database with a different encoding, you have to do dump/restore through the parent table. This is quite similar to what you have to do now if you want to move a range-partitioned table to a database with a different locale. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Tuesday, May 16, 2017, Robert Haas wrote: > I don't really find this a very practical design. If the table > partitions are spread across different relfilenodes, then those > relfilenodes have to have separate pg_class entries and separate > indexes, and those indexes also need to have separate pg_class > entries. Otherwise, nothing works. And if they do have separate > pg_class entries, then the partitions have to have their own names, > and likewise for their indexes, and a dump-and-reload has to preserve > those names. If it doesn't, and those objects get new system-assigned > names after the dump-and-reload, then dump restoration can fail when a > system-assigned name collides with an existing name that is first > mentioned later in the dump. Why can't hash partitions be stored in tables the same way as we do TOAST? That should take care of the naming problem. > If Java has portable hash functions, why can't we? Java standardizes on a particular unicode encoding (utf-16). Are you suggesting that we do the same? Or is there another solution that I am missing? Regards, Jeff Davis
Re: [HACKERS] Hash Functions
On Tue, May 16, 2017 at 08:10:39AM -0700, Jeff Davis wrote: > On Mon, May 15, 2017 at 1:04 PM, David Fetter wrote: > > As the discussion has devolved here, it appears that there are, at > > least conceptually, two fundamentally different classes of partition: > > public, which is to say meaningful to DB clients, and "private", used > > for optimizations, but otherwise opaque to DB clients. > > > > Mashing those two cases together appears to cause more problems than > > it solves. > > I concur at this point. I originally thought hash functions might be > made portable, but I think Tom and Andres showed that to be too > problematic -- the issue with different encodings is the real killer. > > But I also believe hash partitioning is important and we shouldn't > give up on it yet. > > That means we need to have a concept of hash partitions that's > different from range/list partitioning. The terminology > "public"/"private" does not seem appropriate. Logical/physical or > external/internal might be better. I'm not attached to any particular terminology. > With hash partitioning: > * User only specifies number of partitions of the parent table; does > not specify individual partition properties (modulus, etc.) Maybe this is over-thinking it, but I'm picturing us ending up with something along the lines of: PARTITION BY INTERNAL(EXPRESSION) [ WITH ( [PARAMETERS] [, AS, APPROPRIATE] ) ] i.e. it's not clear that we should wire in "number of partitions" as a parameter. In a not that distant future, ANALYZE and similar could have a say in determining both the "how" and the "whether" of partitioning. > * Dump/reload goes through the parent table (though we may provide > options so pg_dump/restore can optimize this) Would it be simplest to default to routing through the immediate ancestor for now? It occurs to me that with the opaque partition system we're designing here, internal partitions would necessarily be leaves in the tree. > * We could provide syntax to adjust the number of partitions, which > would be expensive but still useful sometimes. Yep. I suspect that techniques for this are described in literature, and possibly even in code bases. Any pointers? > * All DDL should be on the parent table, including check constraints, > FKs, unique constraints, exclusion constraints, indexes, etc. Necessarily. > - Unique and exclusion constraints would only be permitted if the > keys are a superset of the partition keys. "Includes either all of the partition expression or none of it," maybe? > - FKs would only be permitted if the two table's partition schemes > match and the keys are members of the same hash opfamily (this could > be relaxed slightly, but it gets a little confusing if so) Relaxing sounds like a not-in-the-first-cut feature, and subtle. > * No attach/detach of partitions Since they're opaque, this is the only sane thing. > * All partitions have the same permissions Since they're opaque, this is the only sane thing. > * Individual partitions would only be individually-addressable for > maintenance (like reindex and vacuum), but not for arbitrary queries Since they're opaque, this is the only sane thing. > - perhaps also COPY for bulk loading/dumping, in case we get clients > smart enough to do their own hashing. This is appealing from a resource allocation point of view in the sense of deciding where the hash computing resources are spent. Do we want something like the NOT VALID/VALIDATE infrastructure to support it? > The only real downside is that it could surprise users -- why can I > add a CHECK constraint on my range-partitioned table but not the > hash-partitioned one? We should try to document this so users don't > find that out too far along. As long as they aren't surprised, I think > users will understand why these aren't quite the same concepts. +1 Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Tue, May 16, 2017 at 11:10 AM, Jeff Davis wrote: > With hash partitioning: > * User only specifies number of partitions of the parent table; does > not specify individual partition properties (modulus, etc.) > * Dump/reload goes through the parent table (though we may provide > options so pg_dump/restore can optimize this) > * We could provide syntax to adjust the number of partitions, which > would be expensive but still useful sometimes. > * All DDL should be on the parent table, including check constraints, > FKs, unique constraints, exclusion constraints, indexes, etc. > - Unique and exclusion constraints would only be permitted if the > keys are a superset of the partition keys. > - FKs would only be permitted if the two table's partition schemes > match and the keys are members of the same hash opfamily (this could > be relaxed slightly, but it gets a little confusing if so) > * No attach/detach of partitions > * All partitions have the same permissions > * Individual partitions would only be individually-addressable for > maintenance (like reindex and vacuum), but not for arbitrary queries > - perhaps also COPY for bulk loading/dumping, in case we get clients > smart enough to do their own hashing. I don't really find this a very practical design. If the table partitions are spread across different relfilenodes, then those relfilenodes have to have separate pg_class entries and separate indexes, and those indexes also need to have separate pg_class entries. Otherwise, nothing works. And if they do have separate pg_class entries, then the partitions have to have their own names, and likewise for their indexes, and a dump-and-reload has to preserve those names. If it doesn't, and those objects get new system-assigned names after the dump-and-reload, then dump restoration can fail when a system-assigned name collides with an existing name that is first mentioned later in the dump. If we had the ability to have anonymous pg_class entries -- relations that have no names -- then maybe it would be possible to make something like what you're talking about work. But that does not seem easy to do. There's a unique index on (relname, relnamespace) for good reason, and we can't make it partial on a system catalog. We could make the relname column allow nulls, but that would add overhead to any code that needs to access the relation name, and there's a fair amount of that. Similarly, if we had the ability to associate multiple relfilenodes with a single relation, and if index entries could point to rather than just , then we could also make this work. But either of those things would require significant re-engineering and would have downsides in other cases. If Java has portable hash functions, why can't we? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Mon, May 15, 2017 at 1:04 PM, David Fetter wrote: > As the discussion has devolved here, it appears that there are, at > least conceptually, two fundamentally different classes of partition: > public, which is to say meaningful to DB clients, and "private", used > for optimizations, but otherwise opaque to DB clients. > > Mashing those two cases together appears to cause more problems than > it solves. I concur at this point. I originally thought hash functions might be made portable, but I think Tom and Andres showed that to be too problematic -- the issue with different encodings is the real killer. But I also believe hash partitioning is important and we shouldn't give up on it yet. That means we need to have a concept of hash partitions that's different from range/list partitioning. The terminology "public"/"private" does not seem appropriate. Logical/physical or external/internal might be better. With hash partitioning: * User only specifies number of partitions of the parent table; does not specify individual partition properties (modulus, etc.) * Dump/reload goes through the parent table (though we may provide options so pg_dump/restore can optimize this) * We could provide syntax to adjust the number of partitions, which would be expensive but still useful sometimes. * All DDL should be on the parent table, including check constraints, FKs, unique constraints, exclusion constraints, indexes, etc. - Unique and exclusion constraints would only be permitted if the keys are a superset of the partition keys. - FKs would only be permitted if the two table's partition schemes match and the keys are members of the same hash opfamily (this could be relaxed slightly, but it gets a little confusing if so) * No attach/detach of partitions * All partitions have the same permissions * Individual partitions would only be individually-addressable for maintenance (like reindex and vacuum), but not for arbitrary queries - perhaps also COPY for bulk loading/dumping, in case we get clients smart enough to do their own hashing. The only real downside is that it could surprise users -- why can I add a CHECK constraint on my range-partitioned table but not the hash-partitioned one? We should try to document this so users don't find that out too far along. As long as they aren't surprised, I think users will understand why these aren't quite the same concepts. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Mon, May 15, 2017 at 03:26:02PM -0400, Robert Haas wrote: > On Sun, May 14, 2017 at 9:35 PM, Andres Freund wrote: > > On 2017-05-14 21:22:58 -0400, Robert Haas wrote: > >> but wanting a CHECK constraint that applies to only one partition > >> seems pretty reasonable (e.g. CHECK that records for older years > >> are all in the 'inactive' state, or whatever). > > > > On a hash-partitioned table? > > No, probably not. But do we really want the rules for partitioned > tables to be different depending on the kind of partitioning in use? As the discussion has devolved here, it appears that there are, at least conceptually, two fundamentally different classes of partition: public, which is to say meaningful to DB clients, and "private", used for optimizations, but otherwise opaque to DB clients. Mashing those two cases together appears to cause more problems than it solves. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
> On May 15, 2017, at 7:48 AM, Jeff Davis wrote: > > On Sun, May 14, 2017 at 6:22 PM, Robert Haas wrote: >> You'd have to prohibit a heck of a lot more than that in order for >> this to work 100% reliably. You'd have to prohibit CHECK constraints, >> triggers, rules, RLS policies, and UNIQUE indexes, at the least. You >> might be able to convince me that some of those things are useless >> when applied to partitions, but wanting a CHECK constraint that >> applies to only one partition seems pretty reasonable (e.g. CHECK that >> records for older years are all in the 'inactive' state, or whatever). >> I think getting this to work 100% reliably in all cases would require >> an impractically large hammer. > > The more I think about it the more I think hash partitions are > "semi-logical". A check constraint on a single partition of a > range-partitioned table makes sense, but it doesn't make sense on a > single partition of a hash-partitioned table. That depends on whether the user gets to specify the hash function, perhaps indirectly by specifying a user defined opfamily. I can imagine clever hash functions that preserve certain properties of the incoming data, and check constraints in development versions of the database that help verify the hash is not violating those properties. That's not to say such hash functions must be allowed in the hash partitioning implementation; just that it does make sense if you squint and look a bit sideways at it. Mark Dilger -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sun, May 14, 2017 at 9:35 PM, Andres Freund wrote: > On 2017-05-14 21:22:58 -0400, Robert Haas wrote: >> but wanting a CHECK constraint that applies to only one partition >> seems pretty reasonable (e.g. CHECK that records for older years are >> all in the 'inactive' state, or whatever). > > On a hash-partitioned table? No, probably not. But do we really want the rules for partitioned tables to be different depending on the kind of partitioning in use? > I'm not saying it can't work for any datatype, I just think it'd be a > very bad idea to make it work for any non-trivial ones. The likelihood > of reguarly breaking or preventing us from improving things seems high. > I'm not sure that having a design where this most of the time works for > some datatypes is a good one. I think you might be engaging in undue pessimism here, but I suspect we need to actually try doing the work before we know how it will turn out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Mon, May 15, 2017 at 07:48:14AM -0700, Jeff Davis wrote: > This would mean we need to reload through the root as Andres and > others suggested, One refinement of this would be to traverse the partition tree, stopping at the first place where the next branch has hash partitions, or at any rate types which have no application-level significance, and load from there. This could allow for parallelizing where it's portable to do so. Level TablePartition Type Base table: Log (N/A) Next partition: Year(range) Next partition: Month (range) Next partition: Day (range) < Data gets loaded no lower than here Next partition: * (hash) That last, any below it, doesn't have a specific name because they're just an implementation detail, i.e. none has any application-level meaning. > and disable a lot of logical partitioning capabilities. I'd be a > little worried about what we do with attaching/detaching, though. Attaching and detaching partitions only makes sense for partitions whose partition keys have application-level meaning anyway. Does it make sense at this point to separate our partitions into two categories, those which have can significance to applications, and those which can't? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Mon, May 15, 2017 at 07:32:30AM -0700, Jeff Davis wrote: > On Sun, May 14, 2017 at 8:00 PM, Bruce Momjian wrote: > > Do we even know that floats are precise enough to determine the > > partition. For example, if you have 6.1, is it possible for > > that to be 5.999 on some systems? Are IEEE systems all the same for > > these values? I would say we should disallow any approximate date type > > for partitioning completely. > > I'm inclined in this direction, as well. Hash partitioning is mostly > useful for things that are likely to be join keys or group keys, and > floats aren't. Same for complex user-defined types. > > The real problem here is what Tom pointed out: that we would have > trouble hashing strings in an encoding-insensitive way. Strings are > useful as join/group keys, so it would be painful to not support them. Well, since we can't mix encodings in the same column, why can't we just hash the binary representation of the string? My point is that wish hashing we aren't comparing one string with another, right? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sun, May 14, 2017 at 6:22 PM, Robert Haas wrote: > You'd have to prohibit a heck of a lot more than that in order for > this to work 100% reliably. You'd have to prohibit CHECK constraints, > triggers, rules, RLS policies, and UNIQUE indexes, at the least. You > might be able to convince me that some of those things are useless > when applied to partitions, but wanting a CHECK constraint that > applies to only one partition seems pretty reasonable (e.g. CHECK that > records for older years are all in the 'inactive' state, or whatever). > I think getting this to work 100% reliably in all cases would require > an impractically large hammer. The more I think about it the more I think hash partitions are "semi-logical". A check constraint on a single partition of a range-partitioned table makes sense, but it doesn't make sense on a single partition of a hash-partitioned table. I think hash partitioning is mainly useful for parallel query (so the optimizer needs to know about it) and maintenance commands (as you listed in another email). But those don't need hash portability. FKs are a little more interesting, but I actually think they still work regardless of hash portability. If the two sides are in the same hash opfamily, they should hash the same and it's fine. And if they aren't, the FK has no hope of working regardless of hash portability. This would mean we need to reload through the root as Andres and others suggested, and disable a lot of logical partitioning capabilities. I'd be a little worried about what we do with attaching/detaching, though. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sun, May 14, 2017 at 8:00 PM, Bruce Momjian wrote: > Do we even know that floats are precise enough to determine the > partition. For example, if you have 6.1, is it possible for > that to be 5.999 on some systems? Are IEEE systems all the same for > these values? I would say we should disallow any approximate date type > for partitioning completely. I'm inclined in this direction, as well. Hash partitioning is mostly useful for things that are likely to be join keys or group keys, and floats aren't. Same for complex user-defined types. The real problem here is what Tom pointed out: that we would have trouble hashing strings in an encoding-insensitive way. Strings are useful as join/group keys, so it would be painful to not support them. Perhaps there's some kind of compromise, like a pg_dump mode that reloads through the root. Or maybe hash partitions are really a "semi-logical" partitioning that the optimizer understands, but where things like per-partition check constraints don't make sense. Regards, Jeff Davis Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sun, May 14, 2017 at 01:06:03PM -0700, Andres Freund wrote: > On 2017-05-14 15:59:09 -0400, Greg Stark wrote: > > Personally while I would like to avoid code that actively crashes or > > fails basic tests on Vax > > I personally vote for simply refusing to run/compile on non-IEEE > platforms, including VAX. The benefit of even trying to get that right, > not to speak of actually testing, seems just not there. Do we even know that floats are precise enough to determine the partition. For example, if you have 6.1, is it possible for that to be 5.999 on some systems? Are IEEE systems all the same for these values? I would say we should disallow any approximate date type for partitioning completely. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Hi, On 2017-05-14 21:22:58 -0400, Robert Haas wrote: > but wanting a CHECK constraint that applies to only one partition > seems pretty reasonable (e.g. CHECK that records for older years are > all in the 'inactive' state, or whatever). On a hash-partitioned table? > Now that's not to say that we shouldn't have a > reload-through-the-top-parent switch; actually, I think that's a good > idea. I just don't believe that it can ever be a complete substitute > for portable hash functions. I think almost everybody here agrees > that it isn't necessary to have hash functions that are 100% portable, > e.g. to VAX. But it would be nice IMHO if you could use an integer > column as the partitioning key and have that be portable between Intel > and POWER. I'm not saying it can't work for any datatype, I just think it'd be a very bad idea to make it work for any non-trivial ones. The likelihood of reguarly breaking or preventing us from improving things seems high. I'm not sure that having a design where this most of the time works for some datatypes is a good one. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sun, May 14, 2017 at 6:29 PM, Andres Freund wrote: > On 2017-05-14 18:25:08 -0400, Tom Lane wrote: >> It may well be that we can get away with saying "we're not going >> to make it simple to move hash-partitioned tables with float >> partition keys between architectures with different float >> representations". But there's a whole lot of daylight between that >> and denying any support for float representations other than the >> currently-most-popular one. > > Note that I, IIRC in the mail you responded to, also argued that I don't > think it'd be a good idea to rely on hashfunctions being portable. The > amount of lock-in that'd create, especially for more complex datatypes, > seems wholly inadvisable. I still think that dumping tables in a way > they're reloaded via the top-partition (probably one copy statement for > each child partition), and prohibiting incoming fkeys to partitions, is > a better approach to all this. You'd have to prohibit a heck of a lot more than that in order for this to work 100% reliably. You'd have to prohibit CHECK constraints, triggers, rules, RLS policies, and UNIQUE indexes, at the least. You might be able to convince me that some of those things are useless when applied to partitions, but wanting a CHECK constraint that applies to only one partition seems pretty reasonable (e.g. CHECK that records for older years are all in the 'inactive' state, or whatever). I think getting this to work 100% reliably in all cases would require an impractically large hammer. Now that's not to say that we shouldn't have a reload-through-the-top-parent switch; actually, I think that's a good idea. I just don't believe that it can ever be a complete substitute for portable hash functions. I think almost everybody here agrees that it isn't necessary to have hash functions that are 100% portable, e.g. to VAX. But it would be nice IMHO if you could use an integer column as the partitioning key and have that be portable between Intel and POWER. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sun, May 14, 2017 at 3:30 PM, Tom Lane wrote: > I agree that the Far Eastern systems that can't easily be replaced > by Unicode are that way mostly because they're a mess. But I'm > still of the opinion that locking ourselves into Unicode is a choice > we might regret, far down the road. It's not a choice that has any obvious upside, so I have no reason to disagree. My point was only that Robert's contention that "You could argue that text-under-LATIN1 and text-under-UTF8 aren't really the same data type at all" seems wrong to me, because PostgreSQL seems to want to treat encoding as a property of the machine. This is evidenced by the fact that we expect applications to change client encoding "transparently". That is, client encoding may be changed without in any way affecting humans that speak a natural language that is provided for by the application's client encoding. That's a great ideal to have, and one that is very close to completely workable. -- Peter Geoghegan VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Mon, May 15, 2017 at 10:08 AM, Thomas Munro wrote: > [2] > https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.cbcux01/flotcop.htm#flotcop Though looking more closely I see that the default is IEEE in 64 bit builds, which seems like a good way to kill the older format off. If/when someone gets PostgreSQL ported to z/OS it probably won't be because they want to run it on an ancient 32 bit mainframe. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Peter Geoghegan writes: > The express goal of the Unicode consortium is to replace all existing > encodings with Unicode. My personal opinion is that a Unicode > monoculture would be a good thing, provided reasonable differences can > be accommodated. Can't help remembering Randall Munroe's take on such things: https://xkcd.com/927/ I agree that the Far Eastern systems that can't easily be replaced by Unicode are that way mostly because they're a mess. But I'm still of the opinion that locking ourselves into Unicode is a choice we might regret, far down the road. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 2017-05-14 18:25:08 -0400, Tom Lane wrote: > It may well be that we can get away with saying "we're not going > to make it simple to move hash-partitioned tables with float > partition keys between architectures with different float > representations". But there's a whole lot of daylight between that > and denying any support for float representations other than the > currently-most-popular one. Note that I, IIRC in the mail you responded to, also argued that I don't think it'd be a good idea to rely on hashfunctions being portable. The amount of lock-in that'd create, especially for more complex datatypes, seems wholly inadvisable. I still think that dumping tables in a way they're reloaded via the top-partition (probably one copy statement for each child partition), and prohibiting incoming fkeys to partitions, is a better approach to all this. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Andres Freund writes: > On 2017-05-14 15:59:09 -0400, Greg Stark wrote: >> Personally while I would like to avoid code that actively crashes or >> fails basic tests on Vax > I personally vote for simply refusing to run/compile on non-IEEE > platforms, including VAX. The point of wanting that is not because anybody thinks that VAX per se is an interesting platform anymore. The point is to avoid locking ourselves into narrow assumptions that we may need to break someday. Saying "nobody cares about floats other than IEEE floats" is morally indistinguishable from saying "nobody cares about running on any platform except Windows", which was a depressingly common opinion back in the nineties or so. It may well be that we can get away with saying "we're not going to make it simple to move hash-partitioned tables with float partition keys between architectures with different float representations". But there's a whole lot of daylight between that and denying any support for float representations other than the currently-most-popular one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Mon, May 15, 2017 at 7:59 AM, Greg Stark wrote: > On 13 May 2017 at 10:29, Robert Haas wrote: >> - Floats. There may be different representations in use on different >> hardware, which could be a problem. Tom didn't answer my question >> about whether any even-vaguely-modern hardware is still using non-IEEE >> floats, which I suspect means that the answer is "no". > > Fwiw the answer to that is certainly no. The only caveat is that some > platforms have not entirely complete implementations of IEEE missing > corner cases such as denormalized values but I don't think that would > be something that would be changed with a different hash function > though. Well... along with the Intel/IEEE-754 and VAX formats, there is a third floating point format that is/was in widespread use: IBM hex float[1]. It's base 16 rather than base 2, and might be the default on some IBM operating systems[2] for the C float and double types (but fortunately not xlc for AIX or Linux, and I have no clue about i/OS). This is probably irrelevant because it looks like people aren't running PostgreSQL on z/OS right now[3], but unlike VAXen these systems are alive and well so I just wanted to respond to the implication on this thread that the whole world is a VAX or an IEEE system :-) People really use this... I happen to know a shop that has petabytes of IBM hex float data. (IBM hardware also supports base 10 floats, but they show up as different types in C so not relevant.) [1] https://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture [2] https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.cbcux01/flotcop.htm#flotcop [3] https://www.postgresql.org/message-id/flat/BLU437-SMTP4B3FF36035D8A3C3816D49C160%40phx.gbl -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sat, May 13, 2017 at 9:11 PM, Robert Haas wrote: > The latter is > generally false already. Maybe LATIN1 -> UTF8 is no-fail, but what > about UTF8 -> LATIN1 or SJIS -> anything? Based on previous mailing > list discussions, I'm under the impression that it is sometimes > debatable how a character in one encoding should be converted to some > other encoding, either because it's not clear whether there is a > mapping at all or it's unclear what mapping should be used. The express goal of the Unicode consortium is to replace all existing encodings with Unicode. My personal opinion is that a Unicode monoculture would be a good thing, provided reasonable differences can be accommodated. So, it might be that there is ambiguity about how one codepoint can be converted to another in another encoding, but that's because encodings like SJIS and BIG5 are needlessly ambiguous. It has nothing to do with cultural preferences leaving the question undecidable (at least by a panel of natural language experts), and everything to do with these regional character encoding systems being objectively bad. They richly deserve to die, and are in fact dying. Encoding actually *is* a property of the machine, even though regional encodings obfuscate things. There is a reason why MacOS and Java use UTF-16 rather than UTF-8, and there is a reason why the defacto standard on the web is UTF-8, and those reasons are completely technical. AFAICT, whatever non-technical reasons remain are actually technical debt in disguise. Where this leaves hash partitioning, I cannot say. -- Peter Geoghegan VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 2017-05-14 15:59:09 -0400, Greg Stark wrote: > Personally while I would like to avoid code that actively crashes or > fails basic tests on Vax I personally vote for simply refusing to run/compile on non-IEEE platforms, including VAX. The benefit of even trying to get that right, not to speak of actually testing, seems just not there. > even I don't think we need to worry about > replication or federated queries in a heterogeneous environment where > some servers are Vaxen and some are modern hardware. That seems a bit > far-fetched. Imo there's little point in trying to delineate some subset that we want to support on platforms that are 20 years out of date. Either we do, or don't. And since there's no point aside of some intellectual curiosity... On the other hand, I don't believe my opinion that requiring hashing to be portable is unrealistic, is meaningfully affected by disallowing non-IEEE floats. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 13 May 2017 at 10:29, Robert Haas wrote: > - Floats. There may be different representations in use on different > hardware, which could be a problem. Tom didn't answer my question > about whether any even-vaguely-modern hardware is still using non-IEEE > floats, which I suspect means that the answer is "no". Fwiw the answer to that is certainly no. The only caveat is that some platforms have not entirely complete implementations of IEEE missing corner cases such as denormalized values but I don't think that would be something that would be changed with a different hash function though. Personally while I would like to avoid code that actively crashes or fails basic tests on Vax even I don't think we need to worry about replication or federated queries in a heterogeneous environment where some servers are Vaxen and some are modern hardware. That seems a bit far-fetched. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sat, May 13, 2017 at 11:47 PM, Andres Freund wrote: > It'll be differently sized on different platforms. So everyone will have to > write hash functions that look at each member individually, rather than > hashing the entire struct at once. And for each member you'll have to use a > type specific hash function... Well, that's possibly kind of annoying, but it still sounds like pretty mechanical work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sat, May 13, 2017 at 1:57 PM, Tom Lane wrote: > Basically, this is simply saying that you're willing to ignore the > hard cases, which reduces the problem to one of documenting the > portability limitations. You might as well not even bother with > worrying about the integer case, because porting between little- > and big-endian systems is surely far less common than cases you've > already said you're okay with blowing off. > > That's not an unreasonable position to take, perhaps; doing better > than that is going to be a lot more work and it's not very clear > how much real-world benefit results. But I can't follow the point > of worrying about endianness but not encoding. Encoding is a user choice, not a property of the machine. Or, looking at it from another point of view, the set of values that can be represented by an int4 is the same whether they are represented in big-endian form or in little-endian form, but the set of values that are representable changes when you switch encodings. You could argue that text-under-LATIN1 and text-under-UTF8 aren't really the same data type at all. It's one thing to say "you can pick up your data and move it to a different piece of hardware and nothing will break". It's quite another thing to say "you can pick up your data and convert it to a different encoding and nothing will break". The latter is generally false already. Maybe LATIN1 -> UTF8 is no-fail, but what about UTF8 -> LATIN1 or SJIS -> anything? Based on previous mailing list discussions, I'm under the impression that it is sometimes debatable how a character in one encoding should be converted to some other encoding, either because it's not clear whether there is a mapping at all or it's unclear what mapping should be used. See, e.g., 2dbbf33f4a95cdcce66365bcdb47c885a8858d3c, or https://www.postgresql.org/message-id/1739a900-30ab-f48e-aec4-2b35475ecf02%402ndquadrant.com where it was discussed that being able to convert encoding A -> encoding B does not guarantee the ability to perform the reverse conversion. Arguing that a given int4 value should hash to the same value on every platform seems like a request that is at least superficially reasonable, if possibly practically tricky in some cases. Arguing that every currently supported encoding should hash every character the same way when they don't all have the same set of characters and the mappings between them are occasionally debatable is asking for the impossible. I certainly don't want to commit to a design for hash partitioning that involves a compatibility break any time somebody changes any encoding conversion in the system, even if a hash function that involved translating every character to some sort of universal code point before hashing it didn't seem likely to be horribly slow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On May 13, 2017 8:44:22 PM PDT, Robert Haas wrote: >On Sat, May 13, 2017 at 7:08 PM, Andres Freund >wrote: >> I seriously doubt that's true. A lot of more complex types have >> internal alignment padding and such. > >True, but I believe we require those padding bytes to be zero. If we >didn't, then hstore_hash would be broken already. It'll be differently sized on different platforms. So everyone will have to write hash functions that look at each member individually, rather than hashing the entire struct at once. And for each member you'll have to use a type specific hash function... Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sat, May 13, 2017 at 7:08 PM, Andres Freund wrote: > I seriously doubt that's true. A lot of more complex types have > internal alignment padding and such. True, but I believe we require those padding bytes to be zero. If we didn't, then hstore_hash would be broken already. > Consider e.g. something like > jsonb, hstore, or postgis types - you *can* convert them to something > that's unambiguous, but it's going to be fairly expensive. I'm fuzzy on what you think we'd need to do. > Essentially > you'd have to something like calling the output function, and then > hashing the result of that. I really don't see why we'd have to go to nearly that length. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 2017-05-13 10:29:09 -0400, Robert Haas wrote: > On Sat, May 13, 2017 at 12:52 AM, Amit Kapila wrote: > > Can we think of defining separate portable hash functions which can be > > used for the purpose of hash partitioning? > > I think that would be a good idea. I think it shouldn't even be that > hard. By data type: > > - Integers. We'd need to make sure that we get the same results for > the same value on big-endian and little-endian hardware, and that > performance is good on both systems. That seems doable. > > - Floats. There may be different representations in use on different > hardware, which could be a problem. Tom didn't answer my question > about whether any even-vaguely-modern hardware is still using non-IEEE > floats, which I suspect means that the answer is "no". If every bit > of hardware we are likely to find uses basically the same > representation of the same float value, then this shouldn't be hard. > (Also, even if this turns out to be hard for floats, using a float as > a partitioning key would be a surprising choice because the default > output representation isn't even unambiguous; you need > extra_float_digits for that.) > > - Strings. There's basically only one representation for a string. > If we assume that the hash code only needs to be portable across > hardware and not across encodings, a position for which I already > argued upthread, then I think this should be manageable. > > - Everything Else. Basically, everything else is just a composite of > that stuff, I think. I seriously doubt that's true. A lot of more complex types have internal alignment padding and such. Consider e.g. something like jsonb, hstore, or postgis types - you *can* convert them to something that's unambiguous, but it's going to be fairly expensive. Essentially you'd have to something like calling the output function, and then hashing the result of that. And hash-partitioning is particularly interesting for e.g. large amounts of points in a geospatial scenario, because other types of partitioning are quite hard to maintain. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 12:38 PM, Robert Haas wrote: > That is a good question. I think it basically amounts to this > question: is hash partitioning useful, and if so, for what? Two words: parallel query. To get parallelism, one of the best approaches is dividing the data, then doing as much work as possible before combining it again. If you have hash partitions on some key, then you can do partition-wise joins or partition-wise aggregation on that key in parallel with no synchronization/communication overhead (until the final result). You've taken postgres pretty far in this direction already; hash partitioning will take it one step further by allowing more pushdowns and lower sync/communication costs. Some of these things can be done with range partitioning, too, but see my other message here: https://www.postgresql.org/message-id/CAMp0ubfNMSGRvZh7N7TRzHHN5tz0ZeFP13Aq3sv6b0H37fdcPg%40mail.gmail.com Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 11:45 AM, Tom Lane wrote: > Forget hash partitioning. There's no law saying that that's a good > idea and we have to have it. With a different set of constraints, > maybe we could do it, but I think the existing design decisions have > basically locked it out --- and I doubt that hash partitioning is so > valuable that we should jettison other desirable properties to get it. A lot of the optimizations that can make use of hash partitioning could also make use of range partitioning. But let me defend hash partitioning: * hash partitioning requires fewer decisions by the user * naturally balances data and workload among partitions in most cases * easy to match with a degree of parallelism But with range partitioning, you can have situations where different tables have different distributions of data. If you partition to balance the data between partitions in both cases, then that makes partition-wise join a lot harder because the boundaries don't line up. If you make the boundaries line up to do partition-wise join, the partitions might have wildly different amounts of data in them. Either way, it makes parallelism harder. Even without considering joins, range partitioning could force you to make a choice between balancing the data and balancing the workload. If you are partitioning based on date, then a lot of the workload will be on more recent partitions. That's desirable sometimes (e.g. for vacuum) but not always desirable for parallelism. Hash partitioning doesn't have these issues and goes very nicely with parallel query. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Robert Haas writes: > On Sat, May 13, 2017 at 12:52 AM, Amit Kapila wrote: >> Can we think of defining separate portable hash functions which can be >> used for the purpose of hash partitioning? > I think that would be a good idea. I think it shouldn't even be that > hard. By data type: > - Integers. We'd need to make sure that we get the same results for > the same value on big-endian and little-endian hardware, and that > performance is good on both systems. That seems doable. > - Floats. There may be different representations in use on different > hardware, which could be a problem. Tom didn't answer my question > about whether any even-vaguely-modern hardware is still using non-IEEE > floats, which I suspect means that the answer is "no". If every bit > of hardware we are likely to find uses basically the same > representation of the same float value, then this shouldn't be hard. > (Also, even if this turns out to be hard for floats, using a float as > a partitioning key would be a surprising choice because the default > output representation isn't even unambiguous; you need > extra_float_digits for that.) > - Strings. There's basically only one representation for a string. > If we assume that the hash code only needs to be portable across > hardware and not across encodings, a position for which I already > argued upthread, then I think this should be manageable. Basically, this is simply saying that you're willing to ignore the hard cases, which reduces the problem to one of documenting the portability limitations. You might as well not even bother with worrying about the integer case, because porting between little- and big-endian systems is surely far less common than cases you've already said you're okay with blowing off. That's not an unreasonable position to take, perhaps; doing better than that is going to be a lot more work and it's not very clear how much real-world benefit results. But I can't follow the point of worrying about endianness but not encoding. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 10:34 AM, Tom Lane wrote: > Maintaining such a property for float8 (and the types that depend on it) > might be possible if you believe that nobody ever uses anything but IEEE > floats, but we've never allowed that as a hard assumption before. This is not such a big practical problem (for me at least) because hashing of floats is of dubious value. > Even architecture dependence isn't the whole scope of the problem. > Consider for example dumping a LATIN1-encoded database and trying > to reload it into a UTF8-encoded database. People will certainly > expect that to be possible, and do you want to guarantee that the > hash of a text value is encoding-independent? That is a major problem. In an ideal world, we could make that work with something like ucol_getSortKey(), but we don't require ICU, and we can't mix getSortKey() with strxfrm(), or even strxfrm() results from different platforms. I don't think it's correct to hash the code points, either, because strings may be considered equal in a locale even if the code points aren't identical. But I don't think postgres lives up to that standard currently. But hash partitioning is too valuable to give up on entirely. I think we should consider supporting a limited subset of types for now with something not based on the hash am. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sat, May 13, 2017 at 12:52 AM, Amit Kapila wrote: > Can we think of defining separate portable hash functions which can be > used for the purpose of hash partitioning? I think that would be a good idea. I think it shouldn't even be that hard. By data type: - Integers. We'd need to make sure that we get the same results for the same value on big-endian and little-endian hardware, and that performance is good on both systems. That seems doable. - Floats. There may be different representations in use on different hardware, which could be a problem. Tom didn't answer my question about whether any even-vaguely-modern hardware is still using non-IEEE floats, which I suspect means that the answer is "no". If every bit of hardware we are likely to find uses basically the same representation of the same float value, then this shouldn't be hard. (Also, even if this turns out to be hard for floats, using a float as a partitioning key would be a surprising choice because the default output representation isn't even unambiguous; you need extra_float_digits for that.) - Strings. There's basically only one representation for a string. If we assume that the hash code only needs to be portable across hardware and not across encodings, a position for which I already argued upthread, then I think this should be manageable. - Everything Else. Basically, everything else is just a composite of that stuff, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Sat, May 13, 2017 at 1:08 AM, Robert Haas wrote: > On Fri, May 12, 2017 at 2:45 PM, Tom Lane wrote: > > Maybe a shorter argument for hash partitioning is that not one but two > different people proposed patches for it within months of the initial > partitioning patch going in. When multiple people are thinking about > implementing the same feature almost immediately after the > prerequisite patches land, that's a good clue that it's a desirable > feature. So I think we should try to solve the problems, rather than > giving up. > Can we think of defining separate portable hash functions which can be used for the purpose of hash partitioning? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 2017-05-12 21:56:30 -0400, Robert Haas wrote: > Cheap isn't free, though. It's got a double-digit percentage overhead > rather than a large-multiple-of-the-runtime overhead as triggers do, > but people still won't want to pay it unnecessarily, I think. That should be partiall addressable with reasonable amounts of engineering though. Efficiently computing the target partition in a hash-partitioned table can be implemented very efficiently, and adding infrastructure for multiple bulk insert targets in copy should be quite doable as well. It's also work that's generally useful, not just for backups. The bigger issue to me here wrt pg_dump is that partitions can restored in parallel, but that'd probably not work as well if dumped separately. Unless we'd do the re-routing on load, rather than when dumping - which'd also increase cache locality, by most of the time (same architecture/encoding/etc) having one backend insert into the same partition. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 7:36 PM, David Fetter wrote: > On Fri, May 12, 2017 at 06:38:55PM -0400, Peter Eisentraut wrote: >> On 5/12/17 18:13, Alvaro Herrera wrote: >> > I think for logical replication the tuple should appear as being in the >> > parent table, not the partition. No? >> >> Logical replication replicates base table to base table. How those >> tables are tied together into a partitioned table or an inheritance tree >> is up to the system catalogs on each side. > > This seems like a totally reasonable approach to pg_dump, especially > in light of the fact that logical replication already (and quite > reasonably) does it this way. Hard work has been done to make > tuple-routing cheap, and this is one of the payoffs. Cheap isn't free, though. It's got a double-digit percentage overhead rather than a large-multiple-of-the-runtime overhead as triggers do, but people still won't want to pay it unnecessarily, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 06:38:55PM -0400, Peter Eisentraut wrote: > On 5/12/17 18:13, Alvaro Herrera wrote: > > I think for logical replication the tuple should appear as being in the > > parent table, not the partition. No? > > Logical replication replicates base table to base table. How those > tables are tied together into a partitioned table or an inheritance tree > is up to the system catalogs on each side. This seems like a totally reasonable approach to pg_dump, especially in light of the fact that logical replication already (and quite reasonably) does it this way. Hard work has been done to make tuple-routing cheap, and this is one of the payoffs. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 5/12/17 18:13, Alvaro Herrera wrote: > I think for logical replication the tuple should appear as being in the > parent table, not the partition. No? Logical replication replicates base table to base table. How those tables are tied together into a partitioned table or an inheritance tree is up to the system catalogs on each side. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Peter Eisentraut wrote: > On 5/12/17 14:23, Robert Haas wrote: > > One alternative would be to change the way that we dump and restore > > the data. Instead of dumping the data with the individual partitions, > > dump it all out for the parent and let tuple routing sort it out at > > restore time. > > I think this could be a pg_dump option. One way it dumps out the > partitions, and another way it recomputes the partitions. I think that > could be well within pg_dump's mandate. I was thinking the same, but enable that option automatically for hash partitioning. Each partition is still dumped separately, but instead of referencing the specific partition in the TABLE DATA object, reference the parent table. This way, the restore can still be parallelized, but tuple routing is executed for each tuple. > (cough ... logical replication ... cough) I think for logical replication the tuple should appear as being in the parent table, not the partition. No? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 5/12/17 14:23, Robert Haas wrote: > One alternative would be to change the way that we dump and restore > the data. Instead of dumping the data with the individual partitions, > dump it all out for the parent and let tuple routing sort it out at > restore time. I think this could be a pg_dump option. One way it dumps out the partitions, and another way it recomputes the partitions. I think that could be well within pg_dump's mandate. (cough ... logical replication ... cough) > Of course, this isn't very satisfying because now > dump-and-restore hasn't really preserved the state of the database; That depends no whether you consider the partitions to be a user-visible or an internal detail. The current approach is sort of in the middle, so it makes sense to allow the user to interpret it either way depending on need. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 2:45 PM, Tom Lane wrote: > Yeah, that isn't really appetizing at all. If we were doing physical > partitioning below the user-visible level, we could make it fly. > But the existing design makes the partition boundaries user-visible > which means we have to insist that the partitioning rule is immutable > (in the strongest sense of being invariant across all installations > you could possibly wish to transport data between). I think you're right. > Now, we already have some issues of that sort with range partitioning; > if say you try to transport data to another system with a different > sort ordering, you have probably got problems. But that issue already > existed with the old manual partitioning approach, ie if you have a > CHECK constraint like "(col >= 'foo' && col < 'gob')" then a transfer > to such a system could fail already. So I'm okay with that. But it > seems like hash partitioning is going to introduce new, exciting, and > hard-to-document-or-avoid portability gotchas. Do we really need > to go there? That is a good question. I think it basically amounts to this question: is hash partitioning useful, and if so, for what? Range and list partitioning inherently provide management benefits. For example, if you range-partition your data by year, then when you want to get rid of the oldest year worth of data, you can drop the entire partition at once, which is likely to be much faster than a DELETE statement. But hash partitioning provide no such benefits because, by design, the distribution of the data among the partitions is essentially random. Dropping one partition will not usually be a useful thing to do because the rows in that partition are logically unconnected. So, if you have a natural way of partitioning a table by range or list, that's probably going to be superior to hash partitioning, but sometimes there isn't an obviously useful way of breaking up the data, but you still want to partition it in order to reduce the size of the individual tables. That, in turn, allows maintenance operations to be performed each partition separately. For example, suppose your table is big enough that running CLUSTER or VACUUM FULL on it takes eight hours, but you can't really afford an eight-hour maintenance window when the table gets bloated. However, you can afford (on Sunday nights when activity is lowest) a window of, say, one hour. Well, if you hash-partition the table, you can CLUSTER or VACUUM FULL one partition a week for N weeks until you get to them all. Similarly, if you need to create an index, you can build it on one partition at a time. You can even add the index to one partition to see how well it works -- for example, does it turn too many HOT updates into non-HOT updates, causing bloat? -- and try it out before you go do it across the board. And an unpartitioned table can only accommodate one VACUUM process at a time, but a partitioned table can have one per partition. Partitioning a table also means that you don't need a single disk volume large enough to hold the whole thing; instead, you can put each partition in a separate tablespace or (in some exciting future world where PostgreSQL looks more like a distributed system) perhaps even on another server. For a table that is a few tens of gigabytes, none of this amounts to a hill of beans, but for a table that is a few tens of terabytes, the time it takes to perform administrative operations can become a really big problem. A good example is, say, a table full of orders. Imagine a high-velocity business like Amazon or UPS that has a constant stream of new orders, or a mobile app that has a constant stream of new user profiles. If that data grows fast enough that the table needs to be partitioned, how should it be done? It's often desirable to create partitions of about equal size and about equal hotness, and range-partitioning on the creation date or order ID number means continually creating new partitions, and having all of the hot data in the same partition. In my experience, it is *definitely* the case that users with very large tables are experiencing significant pain right now. The freeze map changes were a good step towards ameliorating some of that pain, but I think hash partitioning is another step in that direction, and I think there will be other applications as well. Even for users who don't have data that large, there are also interesting query-performance implications of hash partitioning. Joins between very large tables tend to get implemented as merge joins, which often means sorting all the data, which is O(n lg n) and not easy to parallelize. But if those very large tables happened to be compatibly partitioned on the join key, you could do a partitionwise join per the patch Ashutosh proposed, which would be cheaper and easier to do in parallel. Maybe a shorter argument for hash partitioning is that not one but two different people proposed patches for it within
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 02:23:14PM -0400, Robert Haas wrote: > > What about integers? I think we're already assuming two's-complement > arithmetic, which I think means that the only problem with making the > hash values portable for integers is big-endian vs. little-endian. > That's sounds solveable-ish. > xxhash produces identical hashes independent for big-endian and little- endian. https://github.com/Cyan4973/xxHash Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
Robert Haas writes: > On Fri, May 12, 2017 at 1:34 PM, Tom Lane wrote: >> I'd vote that it's not, which means that this whole approach to hash >> partitioning is unworkable. I agree with Andres that demanding hash >> functions produce architecture-independent values will not fly. > If we can't produce architecture-independent hash values, then what's > the other option? Forget hash partitioning. There's no law saying that that's a good idea and we have to have it. With a different set of constraints, maybe we could do it, but I think the existing design decisions have basically locked it out --- and I doubt that hash partitioning is so valuable that we should jettison other desirable properties to get it. > One alternative would be to change the way that we dump and restore > the data. Instead of dumping the data with the individual partitions, > dump it all out for the parent and let tuple routing sort it out at > restore time. Of course, this isn't very satisfying because now > dump-and-restore hasn't really preserved the state of the database; > indeed, you could make it into a hard failure by creating a foreign > key referencing a partition hash partition. Yeah, that isn't really appetizing at all. If we were doing physical partitioning below the user-visible level, we could make it fly. But the existing design makes the partition boundaries user-visible which means we have to insist that the partitioning rule is immutable (in the strongest sense of being invariant across all installations you could possibly wish to transport data between). Now, we already have some issues of that sort with range partitioning; if say you try to transport data to another system with a different sort ordering, you have probably got problems. But that issue already existed with the old manual partitioning approach, ie if you have a CHECK constraint like "(col >= 'foo' && col < 'gob')" then a transfer to such a system could fail already. So I'm okay with that. But it seems like hash partitioning is going to introduce new, exciting, and hard-to-document-or-avoid portability gotchas. Do we really need to go there? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 1:34 PM, Tom Lane wrote: > I'd vote that it's not, which means that this whole approach to hash > partitioning is unworkable. I agree with Andres that demanding hash > functions produce architecture-independent values will not fly. If we can't produce architecture-independent hash values, then what's the other option? One alternative would be to change the way that we dump and restore the data. Instead of dumping the data with the individual partitions, dump it all out for the parent and let tuple routing sort it out at restore time. Of course, this isn't very satisfying because now dump-and-restore hasn't really preserved the state of the database; indeed, you could make it into a hard failure by creating a foreign key referencing a partition hash partition. After dump-and-restore, the row ends up in some other partition and the foreign key can't be recreated because the relationship no longer holds. This isn't limited to foreign keys, either; similar problems could be created with CHECK constraints or other per-table properties that can vary between one child and another. I basically think it's pretty futile to suppose that we can get away with having a dump and restore move rows around between partitions without having that blow up in some cases. > Maintaining such a property for float8 (and the types that depend on it) > might be possible if you believe that nobody ever uses anything but IEEE > floats, but we've never allowed that as a hard assumption before. I don't know how standard that is. Is there any hardware that anyone's likely to be using that doesn't? TBH, I don't really care if support for obscure, nearly-dead platforms like VAX or whatever don't quite work with hash-partitioned tables. In practice, PostgreSQL only sorta works on that kind of platform anyway; there are far bigger problems than this. On the other hand, if there are servers being shipped in 2017 that don't use IEEE floats, that's another problem. What about integers? I think we're already assuming two's-complement arithmetic, which I think means that the only problem with making the hash values portable for integers is big-endian vs. little-endian. That's sounds solveable-ish. > Even architecture dependence isn't the whole scope of the problem. > Consider for example dumping a LATIN1-encoded database and trying > to reload it into a UTF8-encoded database. People will certainly > expect that to be possible, and do you want to guarantee that the > hash of a text value is encoding-independent? No, I think that's expecting too much. I'd be just fine telling people that if you hash-partition on a text column, it may not load into a database with another encoding. If you care about that, don't use hash-partitioning, or don't change the encoding, or dump out the partitions one by one and reload all the roads into the parent table for re-routing, solving whatever problems come up along the way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 05/12/2017 10:17 AM, Robert Haas wrote: > On Fri, May 12, 2017 at 1:12 PM, Andres Freund wrote: >> Given that a lot of data types have a architecture dependent >> representation, it seems somewhat unrealistic and expensive to have >> a hard rule to keep them architecture agnostic. And if that's not >> guaranteed, then I'm doubtful it makes sense as a soft rule >> either. > > That's a good point, but the flip side is that, if we don't have > such a rule, a pg_dump of a hash-partitioned table on one > architecture might fail to restore on another architecture. Today, I > believe that, while the actual database cluster is > architecture-dependent, a pg_dump is architecture-independent. Is it > OK to lose that property? Not from where I sit. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Hash Functions
Robert Haas writes: > On Fri, May 12, 2017 at 1:12 PM, Andres Freund wrote: >> Given that a lot of data types have a architecture dependent representation, >> it seems somewhat unrealistic and expensive to have a hard rule to keep them >> architecture agnostic. And if that's not guaranteed, then I'm doubtful it >> makes sense as a soft rule either. > That's a good point, but the flip side is that, if we don't have such > a rule, a pg_dump of a hash-partitioned table on one architecture > might fail to restore on another architecture. Today, I believe that, > while the actual database cluster is architecture-dependent, a pg_dump > is architecture-independent. Is it OK to lose that property? I'd vote that it's not, which means that this whole approach to hash partitioning is unworkable. I agree with Andres that demanding hash functions produce architecture-independent values will not fly. Maintaining such a property for float8 (and the types that depend on it) might be possible if you believe that nobody ever uses anything but IEEE floats, but we've never allowed that as a hard assumption before. Even architecture dependence isn't the whole scope of the problem. Consider for example dumping a LATIN1-encoded database and trying to reload it into a UTF8-encoded database. People will certainly expect that to be possible, and do you want to guarantee that the hash of a text value is encoding-independent? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 1:12 PM, Andres Freund wrote: > Given that a lot of data types have a architecture dependent representation, > it seems somewhat unrealistic and expensive to have a hard rule to keep them > architecture agnostic. And if that's not guaranteed, then I'm doubtful it > makes sense as a soft rule either. That's a good point, but the flip side is that, if we don't have such a rule, a pg_dump of a hash-partitioned table on one architecture might fail to restore on another architecture. Today, I believe that, while the actual database cluster is architecture-dependent, a pg_dump is architecture-independent. Is it OK to lose that property? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On May 12, 2017 10:05:56 AM PDT, Robert Haas wrote: >On Fri, May 12, 2017 at 12:08 AM, Jeff Davis wrote: >> 1. The hash functions as they exist today aren't portable -- they can >> return different results on different machines. That means using >these >> functions for hash partitioning would yield different contents for >the >> same partition on different architectures (and that's bad, >considering >> they are logical partitions and not some internal detail). > >Hmm, yeah, that is bad. Given that a lot of data types have a architecture dependent representation, it seems somewhat unrealistic and expensive to have a hard rule to keep them architecture agnostic. And if that's not guaranteed, then I'm doubtful it makes sense as a soft rule either. Andres Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 12:08 AM, Jeff Davis wrote: > 1. The hash functions as they exist today aren't portable -- they can > return different results on different machines. That means using these > functions for hash partitioning would yield different contents for the > same partition on different architectures (and that's bad, considering > they are logical partitions and not some internal detail). Hmm, yeah, that is bad. > We could revert 26043592 (copied Tom because that was his patch) to > make hash_any() go back to being portable -- do we know what that > speedup actually was? Maybe the benefit is smaller on newer > processors? Another option is to try to do some combination of > byteswapping and word-at-a-time, which might be better than > byte-at-a-time if the byteswapping is done with a native instruction. With regard to portability, I find that in 2009, according to Tom, we had "already agreed" that it was dispensible: http://postgr.es/m/23832.1234214...@sss.pgh.pa.us I was not able to find where that was agreed. On performance, I found this: https://www.postgresql.org/message-id/20081104202655.gp18...@it.is.rice.edu It says at the end: "The average time to reindex the table using our current hash_any() without the separate mix()/final() was 1696ms and 1482ms with the separate mix()/final() stages giving almost 13% better performance for this stupid metric." > 5. For salts[2], I don't think it's too hard to support them in an > optional way. We just allow the function to be a two-argument function > with a default. Places that care about specifying the salt may do so > if the function has pronargs==2, otherwise it just gets the default > value. If we have salts, I don't think having 64-bit hashes is very > important. If we run out of bits, we can just salt the hash function > differently and get more hash bits. This is not urgent and I believe > we should just implement salts when and if some algorithm needs them. The potential problem with that is that the extra argument might slow down the hash functions enough to matter. Argument unpacking is not free, and the hashing logic itself will get more complex. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers