On 06/13/2013 05:35 PM, Dean Rasheed wrote: > Hi, > > Attached is a patch implementing a new aggregate function md5_agg() to > compute the aggregate MD5 sum across a number of rows. This is > something I've wished for a number of times. I think the primary use > case is to do a quick check that 2 tables, possibly on different > servers, contain the same data, using a query like > > SELECT md5_agg(foo.*::text) FROM (SELECT * FROM foo ORDER BY id) foo; > > or > > SELECT md5_agg(foo.*::text ORDER BY id) FROM foo;
That's a very useful thing to be able to do, but I'm hesitant to make the fact that it uses md5 too prominent in the name if it doesn't produce a result that an external user could reasonably expect from md5'ing the same data. I imagine having an md5_agg(text) and md5(bytea) that was the more efficient, streaming equivalent of: md5(string_agg(the_col,'')) would be rather handy. It'd be less useful for other types (floats, integers, etc) unless we had a way to get the binary representations of those in a well defined form, like int8le(1) . Casting to 'text' would be sufficient for most of the purposes I can imagine, though, and for those that it wouldn't things would quickly get so complicated that you'd want to be using a PL/something function anyway. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers