Hello, I am creating a large database of MD5 hash values. I am a relative newb with PostgreSQL (or any database for that matter). The schema and operation will be quite simple -- only a few tables, probably no stored procedures -- but I may easily end up with several hundred million rows of hash values, possible even get into the billions. The hash values will be organized into logical sets, with a many-many relationship. I have some questions before I set out on this endeavor, however, and would appreciate any and all feedback, including SWAGs, WAGs, and outright lies. :-) I am trying to batch up operations as much as possible, so I will largely be doing comparisons of whole sets, with bulk COPY importing. I hope to avoid single hash value lookup as much as possible.
1. Which datatype should I use to represent the hash value? UUIDs are also 16 bytes... 2. Does it make sense to denormalize the hash set relationships? 3. Should I index? 4. What other data structure options would it make sense for me to choose? Thanks in advance, Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance